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Preface 


The language and concepts of matrix theory and, more generally, of linear 
algebra have come into widespread usage in the social and natural sciences, 
computer science, and statistics. In addition, linear algebra continues to be 
of great importance in modern treatments of geometry and analysis. 

The primary purpose of this fourth edition of Linear Algebra is to present 
a careful treatment of the principal topics of linear algebra and to illustrate 
the power of the subject through a variety of applications. Our major thrust 
emphasizes the symbiotic relationship between linear transformations and 
matrices. However, where appropriate, theorems are stated in the more gen- 
eral infinite-dimensional case. For example, this theory is applied to finding 
solutions to a homogeneous linear differential equation and the best approx- 
imation by a trigonometric polynomial to a continuous function. 

Although the only formal prerequisite for this book is a one-year course 
in calculus, it requires the mathematical sophistication of typical junior and 
senior mathematics majors. This book is especially suited for a second course 
in linear algebra that emphasizes abstract vector spaces, although it can be 
used in a first course with a strong theoretical emphasis. 

The book is organized to permit a number of different courses (ranging 
from three to eight semester hours in length) to be taught from it. The 
core material (vector spaces, linear transformations and matrices, systems of 
linear equations, determinants, diagonalization, and inner product spaces) is 
found in Chapters 1 through 5 and Sections 6.1 through 6.5. Chapters 6 and 
7, on inner product spaces and canonical forms, are completely independent 
and may be studied in either order. In addition, throughout the book are 
applications to such areas as differential equations, economics, geometry, and 
physics. These applications are not central to the mathematical development, 
however, and may be excluded at the discretion of the instructor. 

We have attempted to make it possible for many of the important topics 
of linear algebra to be covered in a one-semester course. This goal has led 
us to develop the major topics with fewer preliminaries than in a traditional 
approach. (Our treatment of the Jordan canonical form, for instance, does 
not require any theory of polynomials.) The resulting economy permits us to 
cover the core material of the book (omitting many of the optional sections 
and a detailed discussion of determinants) in a one-semester four-hour course 
for students who have had some prior exposure to linear algebra. 

Chapter 1 of the book presents the basic theory of vector spaces: sub- 
spaces, linear combinations, linear dependence and independence, bases, and 
dimension. The chapter concludes with an optional section in which we prove 
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that every infinite-dimensional vector space has a basis. 

Linear transformations and their relationship to matrices are the subject 
of Chapter 2. We discuss the null space and range of a linear transformation, 
matrix representations of a linear transformation, isomorphisms, and change 
of coordinates. Optional sections on dual spaces and homogeneous linear 
differential equations end the chapter. 

The application of vector space theory and linear transformations to sys- 
tems of linear equations is found in Chapter 3. We have chosen to defer this 
important subject so that it can be presented as a consequence of the pre- 
ceding material. This approach allows the familiar topic of linear systems to 
illuminate the abstract theory and permits us to avoid messy matrix computa- 
tions in the presentation of Chapters 1 and 2. There are occasional examples 
in these chapters, however, where we solve systems of linear equations. (Of 
course, these examples are not a part of the theoretical development.) The 
necessary background is contained in Section 1.4. 

Determinants, the subject of Chapter 4, are of much less importance than 
they once were. In a short course (less than one year), we prefer to treat 
determinants lightly so that more time may be devoted to the material in 
Chapters 5 through 7. Consequently we have presented two alternatives in 
Chapter 4—a complete development of the theory (Sections 4.1 through 4.3) 
and a summary of important facts that are needed for the remaining chapters 
(Section 4.4). Optional Section 4.5 presents an axiomatic development of the 
determinant. 

Chapter 5 discusses eigenvalues, eigenvectors, and diagonalization. One of 
the most important applications of this material occurs in computing matrix 
limits. We have therefore included an optional section on matrix limits and 
Markov chains in this chapter even though the most general statement of some 
of the results requires a knowledge of the Jordan canonical form. Section 5.4 
contains material on invariant subspaces and the Cayley-Hamilton theorem. 

Inner product spaces are the subject of Chapter 6. The basic mathe- 
matical theory (inner products; the Gram—Schmidt process; orthogonal com- 
plements; the adjoint of an operator; normal, self-adjoint, orthogonal and 
unitary operators; orthogonal projections; and the spectral theorem) is con- 
tained in Sections 6.1 through 6.6. Sections 6.7 through 6.11 contain diverse 
applications of the rich inner product space structure. 

Canonical forms are treated in Chapter 7. Sections 7.1 and 7.2 develop 
the Jordan canonical form, Section 7.3 presents the minimal polynomial, and 
Section 7.4 discusses the rational canonical form. 

There are five appendices. The first four, which discuss sets, functions, 
fields, and complex numbers, respectively, are intended to review basic ideas 
used throughout the book. Appendix E on polynomials is used primarily 
in Chapters 5 and 7, especially in Section 7.4. We prefer to cite particular 
results from the appendices as needed rather than to discuss the appendices 
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independently. 
The following diagram illustrates the dependencies among the various 
chapters. 


Chapter 1 


Chapter 2 


Chapter 3 


Sections 4.1—4.3 
or Section 4.4 


Sections 5.1 and 5.2;___, | Chapter 6 


Section 5.4 


! 


Chapter 7 


One final word is required about our notation. Sections and subsections 
labeled with an asterisk («) are optional and may be omitted as the instructor 
sees fit. An exercise accompanied by the dagger symbol ({) is not optional, 
however—we use this symbol to identify an exercise that is cited in some later 
section that is not optional. 


DIFFERENCES BETWEEN THE THIRD AND FOURTH EDITIONS 


The principal content change of this fourth edition is the inclusion of a 
new section (Section 6.7) discussing the singular value decomposition and 
the pseudoinverse of a matrix or a linear transformation between finite- 
dimensional inner product spaces. Our approach is to treat this material as 
a generalization of our characterization of normal and self-adjoint operators. 

The organization of the text is essentially the same as in the third edition. 
Nevertheless, this edition contains many significant local changes that im- 
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prove the book. Section 5.1 (Eigenvalues and Eigenvectors) has been stream- 
lined, and some material previously in Section 5.1 has been moved to Sec- 
tion 2.5 (The Change of Coordinate Matrix). Further improvements include 
revised proofs of some theorems, additional examples, new exercises, and 
literally hundreds of minor editorial changes. 

We are especially indebted to Jane M. Day (San Jose State University) 
for her extensive and detailed comments on the fourth edition manuscript. 
Additional comments were provided by the following reviewers of the fourth 
edition manuscript: Thomas Banchoff (Brown University), Christopher Heil 
(Georgia Institute of Technology), and Thomas Shemanske (Dartmouth Col- 
lege). 

To find the latest information about this book, consult our web site on 
the World Wide Web. We encourage comments, which can be sent to us by 
e-mail or ordinary post. Our web site and e-mail addresses are listed below. 


web site: http://www.math.ilstu.edu/linalg 


e-mail: linalg@math.ilstu.edu 


Stephen H. Friedberg 
Arnold J. Insel 
Lawrence E. Spence 


Vector Spaces 


1.1 Introduction 

1.2 Vector Spaces 

1.3. Subspaces 

1.4 Linear Combinations and Systems of Linear Equations 
1.5 Linear Dependence and Linear Independence 

1.6 Bases and Dimension 

1.7* Maximal Linearly Independent Subsets 


1.1.) INTRODUCTION 


Many familiar physical notions, such as forces, velocities,! and accelerations, 
involve both a magnitude (the amount of the force, velocity, or acceleration) 
and a direction. Any such entity involving both magnitude and direction is 
called a “vector.” A vector is represented by an arrow whose length denotes 
the magnitude of the vector and whose direction represents the direction of 
the vector. In most physical situations involving vectors, only the magnitude 
and direction of the vector are significant; consequently, we regard vectors 
with the same magnitude and direction as being equal irrespective of their 
positions. In this section the geometry of vectors is discussed. This geometry 
is derived from physical experiments that test the manner in which two vectors 
interact. 

Familiar situations suggest that when two like physical quantities act si- 
multaneously at a point, the magnitude of their effect need not equal the sum 
of the magnitudes of the original quantities. For example, a swimmer swim- 
ming upstream at the rate of 2 miles per hour against a current of 1 mile per 
hour does not progress at the rate of 3 miles per hour. For in this instance 
the motions of the swimmer and current oppose each other, and the rate of 
progress of the swimmer is only 1 mile per hour upstream. If, however, the 


1The word velocity is being used here in its scientific sense—as an entity having 
both magnitude and direction. The magnitude of a velocity (without regard for the 
direction of motion) is called its speed. 
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swimmer is moving downstream (with the current), then his or her rate of 
progress is 3 miles per hour downstream. 

Experiments show that if two like quantities act together, their effect is 
predictable. In this case, the vectors used to represent these quantities can be 
combined to form a resultant vector that represents the combined effects of 
the original quantities. This resultant vector is called the sum of the original 
vectors, and the rule for their combination is called the parallelogram law. 
(See Figure 1.1.) 


Figure 1.1 


Parallelogram Law for Vector Addition. The sum of two vectors 
x and y that act at the same point P is the vector beginning at P that is 
represented by the diagonal of parallelogram having x and y as adjacent sides. 


Since opposite sides of a parallelogram are parallel and of equal length, the 
endpoint @ of the arrow representing x + y can also be obtained by allowing 
x to act at P and then allowing y to act at the endpoint of x. Similarly, the 
endpoint of the vector x + y can be obtained by first permitting y to act at 
P and then allowing x to act at the endpoint of y. Thus two vectors x and 
y that both act at the point P may be added “tail-to-head”; that is, either 
x or y may be applied at P and a vector having the same magnitude and 
direction as the other may be applied to the endpoint of the first. If this is 
done, the endpoint of the second vector is the endpoint of x + y. 

The addition of vectors can be described algebraically with the use of 
analytic geometry. In the plane containing x and y, introduce a coordinate 
system with P at the origin. Let (a,,a2) denote the endpoint of x and (61, 62) 
denote the endpoint of y. Then as Figure 1.2(a) shows, the endpoint Q of x+y 
is (a, + 01, @2 + 62). Henceforth, when a reference is made to the coordinates 
of the endpoint of a vector, the vector should be assumed to emanate from 
the origin. Moreover, since a vector beginning at the origin is completely 
determined by its endpoint, we sometimes refer to the point x rather than 
the endpoint of the vector x if x is a vector emanating from the origin. 

Besides the operation of vector addition, there is another natural operation 
that can be performed on vectors—the length of a vector may be magnified 
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Q 


Figure 1.2 


or contracted. This operation, called scalar multiplication, consists of mul- 
tiplying the vector by a real number. If the vector x is represented by an 
arrow, then for any real number t, the vector tx is represented by an arrow in 
the same direction if t > 0 and in the opposite direction if t < 0. The length 
of the arrow tz is |t| times the length of the arrow x. Two nonzero vectors 
x and y are called parallel if y = tx for some nonzero real number t. (Thus 
nonzero vectors having the same or opposite directions are parallel.) 

To describe scalar multiplication algebraically, again introduce a coordi- 
nate system into a plane containing the vector x so that x emanates from the 
origin. If the endpoint of a has coordinates (a1, a2), then the coordinates of 
the endpoint of ta are easily seen to be (tai, tag). (See Figure 1.2(b).) 

The algebraic descriptions of vector addition and scalar multiplication for 
vectors in a plane yield the following properties: 


1. For all vectors x and y,a+y=y4+2. 

2. For all vectors x, y, and z, (w+y)+z2=a+(y+z). 

3. There exists a vector denoted 0 such that «+ 0 = x for each vector x. 

4. For each vector x, there is a vector y such that «+ y = 0. 

5. For each vector x, la = a. 

6. For each pair of real numbers a and b and each vector x, (ab)x = a(ba). 

7. For each real number a and each pair of vectors x and y, a(a + y) = 
ax + ay. 

8. For each pair of real numbers a and 6 and each vector x, (a+ b)a = 
ax + ba. 


Arguments similar to the preceding ones show that these eight properties, 
as well as the geometric interpretations of vector addition and scalar multipli- 
cation, are true also for vectors acting in space rather than in a plane. These 
results can be used to write equations of lines and planes in space. 
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Consider first the equation of a line in space that passes through two 
distinct points A and B. Let O denote the origin of a coordinate system in 
space, and let u and v denote the vectors that begin at O and end at A and 
B, respectively. If w denotes the vector beginning at A and ending at B, then 
“tail-to-head” addition shows that u+w = v, and hence w = v—u, where —u 
denotes the vector (—1)u. (See Figure 1.3, in which the quadrilateral OABC 
is a parallelogram.) Since a scalar multiple of w is parallel to w but possibly 
of a different length than w, any point on the line joining A and B may be 
obtained as the endpoint of a vector that begins at A and has the form tw 
for some real number t. Conversely, the endpoint of every vector of the form 
tw that begins at A lies on the line joining A and B. Thus an equation of the 
line through A and Bis =u+tw=u+t(v —4u), where t is a real number 
and x denotes an arbitrary point on the line. Notice also that the endpoint 
C of the vector v — u in Figure 1.3 has coordinates equal to the difference of 
the coordinates of B and A. 


Figure 1.3 


Example 1 

Let A and B be points having coordinates (—2,0, 1) and (4,5,3), respectively. 
The endpoint C of the vector emanating from the origin and having the same 
direction as the vector beginning at A and terminating at B has coordinates 
(4,5,3) — (—2,0,1) = (6,5, 2). Hence the equation of the line through A and 
Bis 


x = (—2,0,1)+4(6,5,2). ¢ 


Now let A, B, and C denote any three noncollinear points in space. These 
points determine a unique plane, and its equation can be found by use of our 
previous observations about vectors. Let u and v denote vectors beginning at 
A and ending at B and C, respectively. Observe that any point in the plane 
containing A, B, and C is the endpoint S of a vector x beginning at A and 
having the form su+tv for some real numbers s and t. The endpoint of sw is 
the point of intersection of the line through A and B with the line through S$ 
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Figure 1.4 


parallel to the line through A and C. (See Figure 1.4.) A similar procedure 
locates the endpoint of tv. Moreover, for any real numbers s and t, the vector 
su + tv lies in the plane containing A, B, and C. It follows that an equation 
of the plane containing A, B, and C is 


x=A+sut+tv, 


where s and ¢ are arbitrary real numbers and «x denotes an arbitrary point in 
the plane. 


Example 2 


Let A, B, and C be the points having coordinates (1,0,2), (—3,—2,4), and 
(1,8, —5), respectively. The endpoint of the vector emanating from the origin 
and having the same length and direction as the vector beginning at A and 
terminating at B is 


(—3, —2,4) — (1,0, 2) = (—4, —2, 2). 


Similarly, the endpoint of a vector emanating from the origin and having the 
same length and direction as the vector beginning at A and terminating at C 
is (1,8, —5)—(1,0, 2) = (0,8, —7). Hence the equation of the plane containing 
the three given points is 


xv = (1,0,2) + s(—4, —2,2) + 4(0,8,-7). 


Any mathematical structure possessing the eight properties on page 3 is 
called a vector space. In the next section we formally define a vector space 
and consider many examples of vector spaces other than the ones mentioned 
above. 


EXERCISES 


1. Determine whether the vectors emanating from the origin and termi- 
nating at the following pairs of points are parallel. 
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(a) (3,1,2) and (6,4, 2) 

(b) (—3,1,7) and (9, —3, —21) 
(c) (5,—6,7) and (—5,6,—7) 
(d) (2,0,—5) and (5,0, —2) 


space. 


(3, 2,4) and (—5,7, 
(b) (2,4,0) and (—3,-6, 
(3,7, 2) and (3,7, —8) 
(d) (—2,-1,5) and (3, 9,7) 


3. Find the equations of the planes containing the following points in space. 


(a) (2,—5,-1), (0,4,6), and (—3,7,1) 
(b) (3,—-6,7), (—2,0,—4), and (5, —9, —2) 
(c) (—8,2,0), (1,3,0), and (6, —5,0) 

(d) (1,1,1), (5,5,5), and (—6, 4, 2) 


4. What are the coordinates of the vector 0 in the Euclidean plane that 
satisfies property 3 on page 3? Justify your answer. 


5. Prove that if the vector « emanates from the origin of the Euclidean 
plane and terminates at the point with coordinates (a,,a2), then the 
vector tz that emanates from the origin terminates at the point with 
coordinates (ta,, taz). 


6. Show that the midpoint of the line segment joining the points (a, b) and 
(c,d) is ((a+ c)/2,(b+ d)/2). 


7. Prove that the diagonals of a parallelogram bisect each other. 


1.2 VECTOR SPACES 


In Section 1.1, we saw that with the natural definitions of vector addition and 
scalar multiplication, the vectors in a plane satisfy the eight properties listed 
on page 3. Many other familiar algebraic systems also permit definitions of 
addition and scalar multiplication that satisfy the same eight properties. In 
this section, we introduce some of these systems, but first we formally define 
this type of algebraic structure. 


Definitions. A vector space (or linear space) V over a field? F 
consists of a set on which two operations (called addition and scalar mul- 
tiplication, respectively) are defined so that for each pair of elements x, y, 


?Fields are discussed in Appendix C. 
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in V there is a unique element x + y in V, and for each element a in F' and 
each element x in V there is a unique element ax in V, such that the following 
conditions hold. 


(VS 1) For all x, y inV, «+ y=y+2 (commutativity of addition). 


(VS 2) For all a, y, z in V, (@+y)+2= 2+ (y+ 2) (associativity of 
addition). 


(VS 3) There exists an element in V denoted by 0 such that «+ 0 = «x for 
each x in V. 


(VS 4) For each element x in V there exists an element y in V such that 
ct+y=0. 
(VS 5) For each element x in V, lx = a. 


(VS 6) For each pair of elements a, b in F and each element x in V, 
(ab)x = a(bx). 

(VS 7) For each element a in F and each pair of elements x, y in V, 
a(x + y) = ax + ay. 


(VS 8) For each pair of elements a, b in F and each element x in V, 
(a+ b)a = ax + ba. 


The elements x + y and ax are called the sum of x and y and the product 
of a and «, respectively. 


The elements of the field F' are called scalars and the elements of the 
vector space V are called vectors. The reader should not confuse this use of 
the word “vector” with the physical entity discussed in Section 1.1: the word 
“vector” is now being used to describe any element of a vector space. 

A vector space is frequently discussed in the text without explicitly men- 
tioning its field of scalars. The reader is cautioned to remember, however, 
that every vector space is regarded as a vector space over a given field, which 
is denoted by F. Occasionally we restrict our attention to the fields of real 
and complex numbers, which are denoted R and C, respectively. 

Observe that (VS 2) permits us to unambiguously define the addition of 
any finite number of vectors (without the use of parentheses). 

In the remainder of this section we introduce several important examples 
of vector spaces that are studied throughout this text. Observe that in de- 
scribing a vector space, it is necessary to specify not only the vectors but also 
the operations of addition and scalar multiplication. 

An object of the form (a1, d2,...,@n), where the entries a1, a2,...,@, are 
elements of a field F, is called an n-tuple with entries from F’. The elements 
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@1,02,---,@y are called the entries or components of the n-tuple. Two 
n-tuples (a1, a2,...,@n) and (61, b2,...,bn) with entries from a field F' are 
called equal if a; = b; fori =1,2,...,n. 


Example 1 


The set of all n-tuples with entries from a field F' is denoted by F”. This set is a 
vector space over F’ with the operations of coordinatewise addition and scalar 
multiplication; that is, if wu = (a1, a@2,...,@n) © F", v = (b1,b2...,bn) © F”, 
and c € F, then 


utv = (a1 4+ b1,a2 + be,...,@n+0n) and cu = (cai, cag,...,Can). 
Thus R? is a vector space over R. In this vector space, 
(3, —2,0) + (—1,1,4) = (2,-1,4) and —5(1,—2,0) = (—5,10,0). 
Similarly, C? is a vector space over C.. In this vector space, 
(1 + 74,2) + (2 — 32,42) = (8 —27,2+42) and 7(1+72,2) = (—1+4+ 7,22). 
Vectors in F” may be written as column vectors 
ay 
a2 
an 


rather than as row vectors (a1, @2,...,@,,). Since a 1-tuple whose only entry 
is from F can be regarded as an element of F’, we usually write F' rather than 
F! for the vector space of 1-tuples with entry from F. 


An mx n matrix with entries from a field F' is a rectangular array of the 
form 


a41 412 Gin 
a21 a22 a2n 

. 3 
Aml1 Am2 Amn 


where each entry aj; (1 < i < m, 1 < j < n) is an element of F. We 
call the entries aj; with 7 = j the diagonal entries of the matrix. The 
entries Q;1, 4j2,...,@jn compose the ith row of the matrix, and the entries 
@1j,42j,-+-,@mj; compose the jth column of the matrix. The rows of the 
preceding matrix are regarded as vectors in F”, and the columns are regarded 
as vectors in F™. The m x n matrix in which each entry equals zero is called 
the zero matrix and is denoted by O. 
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In this book, we denote matrices by capital italic letters (e.g., A, B, and 
C), and we denote the entry of a matrix A that lies in row i and column j by 
Aj;. In addition, if the number of rows and columns of a matrix are equal, 
the matrix is called square. 

Two m x n matrices A and B are called equal if all their corresponding 
entries are equal, that is, if Ajj = By for 1 <i<mand1l<j<n. 


Example 2 


The set of all mxn matrices with entries from a field F is a vector space, which 
we denote by Minx n(F), with the following operations of matrix addition 
and scalar multiplication: For A,B € Mnxn(F) and ce F, 


(A + B)i; = Ajj + Bi; and (cA) i, = cA; 


for 1 <i<mand1<j <n. For instance, 


DO W\i (a8. HR 6) (8 22. 
ie, 8 8 Aas oa 8 


and 


in Mox3(R). 4 


Example 3 


Let S be any nonempty set and F be any field, and let F(S, Ff) denote the 
set of all functions from S to F. Two functions f and g in F(S, F) are called 
equal if f(s) = g(s) for each s € S. The set F(S, F’) is a vector space with 
the operations of addition and scalar multiplication defined for f,g € F(S, F) 
and c € F by 


(f+ 9)(s) = f(s) + 9(s) and (cf)(s) = [f(s] 


for each s € S. Note that these are the familiar operations of addition and 
scalar multiplication for functions used in algebra and calculus. 


A polynomial with coefficients from a field F' is an expression of the form 


1 


f@)=cnn" apie” Es aie as, 


where n is a nonnegative integer and each ax, called the coefficient of x*, is 
in F. If f(x) = 0, that is, if a, = dn_1 = ++: = a9 = O, then f(z) is called 
the zero polynomial and, for convenience, its degree is defined to be —1; 
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otherwise, the degree of a polynomial is defined to be the largest exponent 
of x that appears in the representation 
f(x) = a,t" + Agee +--+ + a1% + ao 


with a nonzero coefficient. Note that the polynomials of degree zero may be 
written in the form f(x) = c for some nonzero scalar c. Two polynomials, 


f(x) = anx” + Qn—10" | + +++ + 412+ a9 
and 
(2) = bmz™ + dma? + +++ + biz + bo, 


are called equal if m =n and a; = b; fori =0,1,...,n. 

When F is a field containing infinitely many scalars, we usually regard 
a polynomial with coefficients from F' as a function from F into F. (See 
page 569.) In this case, the value of the function 


f(x) = anx” + Qn—10" | + +++ + a,x +49 
at c € F is the scalar 
f(c) Sane" +a,21c" 4 + aye + a9: 


Here either of the notations f or f(x) is used for the polynomial function 


fla) Saga" tone"! +--+ aie + an. 
Example 4 
Let 


f(«) = Ont" + a a +++ + at + ao 


and 


g(x) = bmaz™ + bye +++++b,x + bo 


be polynomials with coefficients from a field F’. Suppose that m < n, and 


define bmi = bm42 =+-+: = by = 0. Then g(x) can be written as 
g(x) = bnx™ + bp_yx™ 1 +--+ 4+ bya t bo. 
Define 
f(x) + g(x) = (an + bn)@" +(Gn—1+ ba—1)2"* ++ +++ (a1 + b1)a+(a0 + bo) 


and for any c € F, define 
cf (x) = canx”™ + can_12™ 1 +--+ + ca,x 4+ cag. 


With these operations of addition and scalar multiplication, the set of all 
polynomials with coefficients from F' is a vector space, which we denote by 


P(F). 
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We will see in Exercise 23 of Section 2.4 that the vector space defined in 
the next example is essentially the same as P(F’). 


Example 5 


Let F' be any field. A sequence in F is a function o from the positive integers 
into F’. In this book, the sequence o such that o(n) = a, for n = 1,2,... is 
denoted {a,,}. Let V consist of all sequences {a,,} in F' that have only a finite 
number of nonzero terms ay. If {ap} and {b,} are in V and ¢t € F, define 


{an} + {by} = {an + bn} and t{a,} = {tan}. 
With these operations V is a vector space. 


Our next two examples contain sets on which addition and scalar multi- 
plication are defined, but which are not vector spaces. 


Example 6 
Let S = {(a1, a2): a1,a2 € R}. For (a1, a2), (bi, b2) € S and c € R, define 
(a1, 2) + (b1, 62) = (a1 + 61,42 — bz) and c(a1,a2) = (car, ca). 


Since (VS 1), (VS 2), and (VS 8) fail to hold, S is not a vector space with 
these operations. 


Example 7 
Let S be as in Example 6. For (a1, a2), (b1, b2) € S and c € R, define 


(a1, a2) + (b1, bz) = (ay + b;,0) and c(ay, a2) = (cay, 0). 


Then S is not a vector space with these operations because (VS 3) (hence 
(VS 4)) and (VS 5) fail. 


We conclude this section with a few of the elementary consequences of the 
definition of a vector space. 


Theorem 1.1 (Cancellation Law for Vector Addition). If z, y, 
and z are vectors in a vector space V such thatx+z=y+2, thenx=y. 


Proof. There exists a vector v in V such that z+v = 0 (VS 4). Thus 


e=e2+0=2e4+ (24+) =(e4+2) +0 
=(ytz)+tvu=yt(ztv)=yt+0=y 


by (VS 2) and (VS 3). | 


Corollary 1. The vector 0 described in (VS 3) is unique. 
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Proof. Exercise. i 
Corollary 2. The vector y described in (VS 4) is unique. 
Proof. Exercise. | 


The vector 0 in (VS 3) is called the zero vector of V, and the vector y in 
(VS 4) (that is, the unique vector such that 7+ y = 0) is called the additive 
inverse of x and is denoted by —2. 

The next result contains some of the elementary properties of scalar mul- 
tiplication. 


Theorem 1.2. In any vector space V, the following statements are true: 
(a) Ox = 0 for eachx eV. 
(b) (—a)x = —(ax) = a(—2) for each a € F and each x € V. 
(c) a0 = 0 for eacha€e F. 


Proof. (a) By (VS 8), (VS 3), and (VS 1), it follows that 
Ox + Ox = (0+ 0)x = 02 = 02 +0 = 0402. 


Hence 02 = 0 by Theorem 1.1. 

(b) The vector —(ax) is the unique element of V such that ax + [—(ax)] = 
0. Thus if az +(—a)a = 0, Corollary 2 to Theorem 1.1 implies that (—a)xz = 
—(ax). But by (VS 8), 


ax + (—a)x = [a+ (—a)|x = 02 = 0 


by (a). Consequently (—a)x = —(ax). In particular, (—1)x = —x. So, 
by (VS 6), 
a(—a) = a[(—1)2] = [a(—1)|2 = (—a)a. 
The proof of (c) is similar to the proof of (a). | 
EXERCISES 


1. Label the following statements as true or false. 


(a) Every vector space contains a zero vector. 

(b) A vector space may have more than one zero vector. 

(c) In any vector space, az = bx implies that a = b. 

(d) In any vector space, ax = ay implies that x = y. 

(e) A vector in F” may be regarded as a matrix in M, x1 (F). 

(f) Anm xn matrix has m columns and n rows. 

(g) In P(F), only polynomials of the same degree may be added. 

(h) If f and g are polynomials of degree n, then f +g is a polynomial 
of degree n. 

(i) If f is a polynomial of degree n and c is a nonzero scalar, then cf 
is a polynomial of degree n. 
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(j) A nonzero scalar of F' may be considered to be a polynomial in 
P(F’) having degree zero. 

(k) Two functions in F(S,F') are equal if and only if they have the 
same value at each element of S. 


2. Write the zero vector of Msx4(F). 
3. If 


what are Mj3, Mao,, and Mo2? 


4. Perform the indicated operations. 


0 GE) 99 


6 BN ft 5 
(by 82 ils 0s 
be 8) NB 30 
Bhs 
(<) a7 0 _) 
6 4 
(@)i-8 | 3.9 
1 8 


(e) (22+ — 7x3 + 42 +3) + ( 

(f) (—323 + 7x? + 8x — 6) + (2x3 — 82 + 10) 
(g) 5(2x” — 6x4 + 82? — 3x) 

(h) 3(a° — 2x3 + 4x + 2) 


Exercises 5 and 6 show why the definitions of matrix addition and scalar 
multiplication (as defined in Example 2) are the appropriate ones. 


5. Richard Gard (“Effects of Beaver on Trout in Sagehen Creek, Cali- 
fornia,” J. Wildlife Management, 25, 221-242) reports the following 
number of trout having crossed beaver dams in Sagehen Creek. 


Upstream Crossings 


Fall Spring Summer 
Brook trout 8 3 1 
Rainbow trout 3 0 0 


Brown trout 3 0 0 
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Downstream Crossings 


Fall Spring Summer 
Brook trout 9 at 4 
Rainbow trout 3 0 0 
Brown trout 1 ub 0 


Record the upstream and downstream crossings in two 3 x 3 matrices, 
and verify that the sum of these matrices gives the total number of 
crossings (both upstream and downstream) categorized by trout species 
and season. 


6. At the end of May, a furniture store had the following inventory. 


Early Mediter- 
American Spanish ranean Danish 
Living room suites 4 2 1 3 
Bedroom suites 5 1 1 4 
Dining room suites 3 1 2 6 


Record these data as a 3 x 4 matrix M. To prepare for its June sale, 
the store decided to double its inventory on each of the items listed in 
the preceding table. Assuming that none of the present stock is sold 
until the additional furniture arrives, verify that the inventory on hand 
after the order is filled is described by the matrix 2M. If the inventory 
at the end of June is described by the matrix 


5 
A= |6 
1 


On w 


1 
1 
3 
interpret 2M — A. How many suites were sold during the June sale? 


7. Let S = {0,1} and F = R. In F(S, R), show that f= g and ftg=h, 
where f(t) = 2¢+1, g(t) =1+4t — 2t?, and h(t) =5' +1. 


8. In any vector space V, show that (a+ b)(#+ y) = av + ay + ba + by for 
any x,y € V and any a,be F. 


9. Prove Corollaries 1 and 2 of Theorem 1.1 and Theorem 1.2(c). 


10. Let V denote the set of all differentiable real-valued functions defined 
on the real line. Prove that V is a vector space with the operations of 
addition and scalar multiplication defined in Example 3. 
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Let V = {0} consist of a single vector 0 and define 0 + 0 = 0 and 
cO = 0 for each scalar c in F. Prove that V is a vector space over F’. 
(V is called the zero vector space.) 


A real-valued function f defined on the real line is called an even func- 
tion if f(—t) = f(t) for each real number t. Prove that the set of even 
functions defined on the real line with the operations of addition and 
scalar multiplication defined in Example 3 is a vector space. 


Let V denote the set of ordered pairs of real numbers. If (a;,a2) and 
(b1, b2) are elements of V and c € R, define 


(a1, az) + (b1, bz) = (ay + by, azb2) and c(a1, a2) = (cay, ag). 
Is V a vector space over R with these operations? Justify your answer. 
Let V = {(a1,4@2,..-,@n): a; € Cfori = 1,2,...n}; so V is a vector 
space over C' by Example 1. Is V a vector space over the field of real 


numbers with the operations of coordinatewise addition and multipli- 
cation? 


Let V = {(a1,@9,...,@n): a, € Rfori = 1,2,...n}; so V is a vec- 
tor space over R by Example 1. Is V a vector space over the field of 


complex numbers with the operations of coordinatewise addition and 
multiplication? 


Let V denote the set of all m x n matrices with real entries; so V 
is a vector space over R by Example 2. Let F' be the field of rational 
numbers. Is V a vector space over F' with the usual definitions of matrix 


addition and scalar multiplication? 


Let V = {(a1,@2): a1,a2 € F}, where F is a field. Define addition of 
elements of V coordinatewise, and for c € F' and (aj, a2) € V, define 


c(a1, az) = (a1, 0). 
Is V a vector space over F' with these operations? Justify your answer. 


Let V = {(a1,@2): a1,a2 € R}. For (a1,a2),(b1,b2) € V andce R, 
define 


(a1, a2) + (b1, b2) = (a1 + 2b1,a2 + 3b2) and c(a1, a2) = (caj, Caz). 


Is V a vector space over R with these operations? Justify your answer. 
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19. Let V = {(a@1, a2): a1,a@2 € R}. Define addition of elements of V coor- 
dinatewise, and for (a1,a2) in V and c € R, define 


(0, 0) ifc=0 
(car, =) if c £0. 
c 


c(a1, a2) = 


Is V a vector space over R with these operations? Justify your answer. 


20. Let V be the set of sequences {a,,} of real numbers. (See Example 5 for 
the definition of a sequence.) For {an}, {bn} € V and any real number 
t, define 


{an} + {bn} = {an +bn} and t{a,} = {tay}. 
Prove that, with these operations, V is a vector space over R. 


21. Let V and W be vector spaces over a field F’. Let 
Z= {(v,w): v € Vand w € Wh. 
Prove that Z is a vector space over F’ with the operations 


(v1, W1) + (v2, W2) = (Vi tve,W1+w2) and c(v1,w1) = (cv1, cw1). 


22. How many matrices are there in the vector space Mnxn(Z2)? (See 
Appendix C.) 


1.3. SUBSPACES 


In the study of any algebraic structure, it is of interest to examine subsets that 
possess the same structure as the set under consideration. The appropriate 
notion of substructure for vector spaces is introduced in this section. 


Definition. A subset W of a vector space V over a field F' is called a 
subspace of V if W is a vector space over F' with the operations of addition 
and scalar multiplication defined on V. 


In any vector space V, note that V and {0} are subspaces. The latter is 
called the zero subspace of V. 

Fortunately it is not necessary to verify all of the vector space properties 
to prove that a subset is a subspace. Because properties (VS 1), (VS 2), 
(VS 5), (VS 6), (VS 7), and (VS 8) hold for all vectors in the vector space, 
these properties automatically hold for the vectors in any subset. Thus a 
subset W of a vector space V is a subspace of V if and only if the following 
four properties hold. 
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1. a+y € W whenever x € W and y € W. (W is closed under addition.) 

2. ce € W whenever c € F and x € W. (W is closed under scalar 
multiplication.) 

3. W has a zero vector. 

4. Each vector in W has an additive inverse in W. 


The next theorem shows that the zero vector of W must be the same as 
the zero vector of V and that property 4 is redundant. 


Theorem 1.3. Let V be a vector space and W a subset of V. Then W 
is a subspace of V if and only if the following three conditions hold for the 
operations defined in V. 

(a) OEW. 
(b) 2+y € W whenever x € W and y € W. 
(c) cx € W whenever c € F and x € W. 


Proof. If W is a subspace of V, then W is a vector space with the operations 
of addition and scalar multiplication defined on V. Hence conditions (b) and 
(c) hold, and there exists a vector 0’ € W such that «+ 0’ = x for each 
x € W. But also x+ 0 = a, and thus 0’ = 0 by Theorem 1.1 (p. 11). So 
condition (a) holds. 

Conversely, if conditions (a), (b), and (c) hold, the discussion preceding 
this theorem shows that W is a subspace of V if the additive inverse of each 
vector in W lies in W. But if « € W, then (—1)x € W by condition (c), and 
—x = (—1)x by Theorem 1.2 (p. 12). Hence W is a subspace of V. | 


The preceding theorem provides a simple method for determining whether 
or not a given subset of a vector space is a subspace. Normally, it is this result 
that is used to prove that a subset is, in fact, a subspace. 

The transpose A‘ of an m x n matrix A is the n x m matrix obtained 
from A by interchanging the rows with the columns; that is, (A‘);; = Aji. 
For example, 


€ -2 y- E : na (; 3) =(3 ;) 

0 5 -1 a 2s 2 3 2 3 
Asymmetric matrix is a matrix A such that At = A. For example, the 

2 x 2 matrix displayed above is a symmetric matrix. Clearly, a symmetric 


matrix must be square. The set W of all symmetric matrices in Myxn(F) is 
a subspace of Mnxn(F’) since the conditions of Theorem 1.3 hold: 


1. The zero matrix is equal to its transpose and hence belongs to W. 


It is easily proved that for any matrices A and B and any scalars a and 6, 
(aA + bB)' = aA' + bB*. (See Exercise 3.) Using this fact, we show that the 
set of symmetric matrices is closed under addition and scalar multiplication. 
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2. If A€ W and B € W, then A’ = A and Bt = B. Thus (A+ B)' = 
A'+ B'=A+B,so that A+BeW. 


3. If Ae W, then A’ = A. So for any a € F, we have (aA)! = aA! =aA. 
Thus aA € W. 


The examples that follow provide further illustrations of the concept of a 
subspace. The first three are particularly important. 


Example 1 


Let n be a nonnegative integer, and let P,,(/) consist of all polynomials in 
P(F) having degree less than or equal to n. Since the zero polynomial has 
degree —1, it isin P,,(F’). Moreover, the sum of two polynomials with degrees 
less than or equal to n is another polynomial of degree less than or equal to n, 
and the product of a scalar and a polynomial of degree less than or equal to 
n is a polynomial of degree less than or equal to n. So P,,(’) is closed under 
addition and scalar multiplication. It therefore follows from Theorem 1.3 that 
P,(F) is a subspace of P(F). 


Example 2 


Let C(R) denote the set of all continuous real-valued functions defined on R. 
Clearly C(R) is a subset of the vector space F(R, R) defined in Example 3 
of Section 1.2. We claim that C(R) is a subspace of F(R, R). First note 
that the zero of F(R, R) is the constant function defined by f(t) = 0 for all 
t € R. Since constant functions are continuous, we have f € C(R). Moreover, 
the sum of two continuous functions is continuous, and the product of a real 
number and a continuous function is continuous. So C(R) is closed under 
addition and scalar multiplication and hence is a subspace of F(R, R) by 
Theorem 1.3. 


Example 3 


An nxn matrix M is called a diagonal matrix if Mj; = 0 whenever i ¥ j, 
that is, if all its nondiagonal entries are zero. Clearly the zero matrix is a 
diagonal matrix because all of its entries are 0. Moreover, if A and B are 
diagonal n x n matrices, then whenever i F j, 


(A+ B),; = Aij + Bij =0+0=0 and (cA);; = cAiy =c0=0 
for any scalar c. Hence A+ B and cA are diagonal matrices for any scalar 


c. Therefore the set of diagonal matrices is a subspace of Mnxn(F’) by Theo- 
rm13. 


Example 4 


The trace of an n x n matrix M, denoted tr(M), is the sum of the diagonal 
entries of M; that is, 


tr(M) =Miy + Mo.+ -+>+ Mnn. 
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It follows from Exercise 6 that the set of n x n matrices having trace equal 
to zero is a subspace of Myxn(F). ¢ 


Example 5 


The set of matrices in My,xn(R) having nonnegative entries is not a subspace 
of Minxn (2) because it is not closed under scalar multiplication (by negative 
scalars). 


The next theorem shows how to form a new subspace from other sub- 
spaces. 


Theorem 1.4. Any intersection of subspaces of a vector space V is a 
subspace of V. 


Proof. Let C be a collection of subspaces of V, and let W denote the 
intersection of the subspaces in C. Since every subspace contains the zero 
vector, 0 © W. Let a€ F and x,y € W. Then z and y are contained in each 
subspace in C. Because each subspace in C is closed under addition and scalar 
multiplication, it follows that x+y and az are contained in each subspace in 
C. Hence «+ y and az are also contained in W, so that W is a subspace of V 
by Theorem 1.3. | 


Having shown that the intersection of subspaces of a vector space V is a 
subspace of V, it is natural to consider whether or not the union of subspaces 
of V is a subspace of V. It is easily seen that the union of subspaces must 
contain the zero vector and be closed under scalar multiplication, but in 
general the union of subspaces of V need not be closed under addition. In fact, 
it can be readily shown that the union of two subspaces of V is a subspace of V 
if and only if one of the subspaces contains the other. (See Exercise 19.) There 
is, however, a natural way to combine two subspaces W, and W, to obtain 
a subspace that contains both W; and W2. As we already have suggested, 
the key to finding such a subspace is to assure that it must be closed under 
addition. This idea is explored in Exercise 23. 


EXERCISES 


1. Label the following statements as true or false. 


(a) IfV is a vector space and W is a subset of V that is a vector space, 
then W is a subspace of V. 

(b) The empty set is a subspace of every vector space. 

(c) If V is a vector space other than the zero vector space, then V 
contains a subspace W such that W # V. 

(d) The intersection of any two subsets of V is a subspace of V. 
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(e) Ann x n diagonal matrix can never have more than n nonzero 
entries. 

(f) The trace of a square matrix is the product of its diagonal entries. 

(g) Let W be the ry-plane in R°; that is, W = {(a1,a2,0): a1, a2 € R}. 
Then W = R?. 


Determine the transpose of each of the matrices that follow. In addition, 
if the matrix is square, compute its trace. 


oO ot4 


-3 9 10 0 -8 
(c){ 0 -2 (d){ 2 -4 8 
-5 7 6 

(e) (1 -1 3 5) (f) © : : 5 
5 -4 0 6 

(g) | 6 (h){ 0 1 -3 


Prove that (aA + bB)' = aA’ + bB! for any A,B E Mmxn(F) and any 
a,be F. 


Prove that (A*)’ = A for each A € Mnyn(F). 

Prove that A+ A’ is symmetric for any square matrix A. 

Prove that tr(aA + DB) = atr(A) + btr(B) for any A, BE Mnxn(F). 
Prove that diagonal matrices are symmetric matrices. 


Determine whether the following sets are subspaces of R® under the 
operations of addition and scalar multiplication defined on R?. Justify 
your answers. 


(a) ve = {(a1, a2, a3) € R®: ay = 3a2 and a3 = —ag} 
(b) W = {(a1, do, a3) € R?: ay = a3 + 2} 

(c) W = {(a1, da, a3) €E R®: 2a, — Tag + az = 0} 
(d) Ws = {(ontsas) € RE a, — 4a2 — a3 = 0} 
(e) Ws = {(a1, a2, a3) € R?: a, + 2a — 3a3 = 1} 
(f) We = {(a1, a2, a3) € R®: 5a? — 3a2 + 6a2 = 0} 


Let W,, Ws, and W, be as in Exercise 8. Describe W, NW3, W, 1 W4, 
and W3 Wa, and observe that each is a subspace of R®. 
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Prove that W, = {(a1,4@2,...,@n) € F": ay tag +--- +a, = Of isa 
subspace of F”, but Wz = {(a1, @2,...,@n) € F": ay tagt---+a, = 1} 
is not. 


Is the set W = {f(x) € P(F): f(x) = 0 or f(x) has degree n} a subspace 
of P(F) ifn > 1? Justify your answer. 


An mxn matrix A is called upper triangular if all entries lying below 
the diagonal entries are zero, that is, if Aj; = 0 whenever i > 7. Prove 
that the upper triangular matrices form a subspace of Minx n(F’). 


Let S be a nonempty set and F a field. Prove that for any sg € S, 
{f € F(S,F): f (so) =O}, is a subspace of F(S, F). 


Let S be a nonempty set and F a field. Let C(S,F) denote the set of 
all functions f € F(S,F) such that f(s) =0 for all but a finite number 
of elements of S. Prove that C(S, F’) is a subspace of F(S, F). 


Is the set of all differentiable real-valued functions defined on R a sub- 
space of C(R)? Justify your answer. 


Let C"(R) denote the set of all real-valued functions defined on the 
real line that have a continuous nth derivative. Prove that C”(R) is a 
subspace of F(R, R). 


Prove that a subset W of a vector space V is a subspace of V if and 
only if W 4 @, and, whenever a € F' and x,y € W, then az € W and 
e+yew. 


Prove that a subset W of a vector space V is a subspace of V if and only 
if 0 € W and ax + y € W whenever a € F and z,y € W. 


Let W; and W32 be subspaces of a vector space V. Prove that W; U W2 
is a subspace of V if and only if Wy C W2 or W2 C Wj. 


.' Prove that if W is a subspace of a vector space V and w, wo,..., Wn are 


in W, then a,w, + agqw2+---+anwy € W for any scalars a1, a2,..., Qn. 


Show that the set of convergent sequences {a,,} (i-e., those for which 
limy—oo Gn, exists) is a subspace of the vector space V in Exercise 20 of 
Section 1.2. 


Let F, and Fy be fields. A function g € F(F\, F2) is called an even 
function if g(—t) = g(t) for each t € F; and is called an odd function 
if g(—t) = —g(t) for each t € F\. Prove that the set of all even functions 
in F(F,, F2) and the set of all odd functions in F(F1, F2) are subspaces 
of F(Fi, Fy). 


+A dagger means that this exercise is essential for a later section. 
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The following definitions are used in Exercises 23-30. 


Definition. If, and Sz are nonempty subsets of a vector space V, then 


the sum of S; and $2, denoted S,+S9, is the set {a+y: x € S; andy € So}. 


Definition. A vector space V is called the direct sum of W, and W>) if 


W, and W2 are subspaces of V such that W1 NW, = {0} and W,+W2 = V. 
We denote that V is the direct sum of W, and W2 by writing V = W, © W2. 


23. 


24. 


25. 


26. 


27. 


Let W, and W)2 be subspaces of a vector space V. 

(a) Prove that W,+W4 is a subspace of V that contains both W, and 
Wo. 

(b) Prove that any subspace of V that contains both W; and W2 must 
also contain W,; + Wo. 

Show that F” is the direct sum of the subspaces 


Wi = {(@1, d2,..-,@n) e PY: Gn, = 0} 


and 


Wo = {(a1, d2,---,@n) € F": ay = ag = ++ = An_1 = Of. 


Let W, denote the set of all polynomials f(x) in P(F’) such that in the 
representation 


f(x) = ant” +n12" ) +--+ + ax + a0, 


we have a; = 0 whenever 7 is even. Likewise let W2 denote the set of 
all polynomials g(2) in P(F’) such that in the representation 


g(x) = bm2™ + bm—12"! + +++ + bx + bo, 
we have 0; = 0 whenever 7 is odd. Prove that P(F’) = W, @ Wo. 


In Mmxn(F) define W, = {A € Mmxn(F): Aij = 0 whenever i > j} 
and Wz = {A © Mnxn(F): Ai; = 0 whenever i < j}. (Wi is the 
set of all upper triangular matrices defined in Exercise 12.) Show that 
Minxn( 2) = Wi @ Wo. 


Let V denote the vector space consisting of all upper triangular n x n 
matrices (as defined in Exercise 12), and let W, denote the subspace of 
V consisting of all diagonal matrices. Show that V = W; 6 Wz, where 
Wz ={A€V: A;; = 0 whenever i > j}. 
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28. 


29. 
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A matrix M is called skew-symmetric if Mt = —M. Clearly, a skew- 
symmetric matrix is square. Let F' be a field. Prove that the set W, 
of all skew-symmetric n x n matrices with entries from F' is a subspace 
of Mnxn(F). Now assume that F' is not of characteristic 2 (see Ap- 
pendix C), and let W2 be the subspace of M,x»(F’) consisting of all 
symmetric n x n matrices. Prove that Mnyxn(F) = Wi @ Wo. 


Let F be a field that is not of characteristic 2. Define 
Wi = {A © Mnxn(£): Aij = 0 whenever i < j} 


and W»2 to be the set of all symmetric n x n matrices with entries 
from F. Both W, and W2 are subspaces of Mnxn(F). Prove that 
Mnxn(F) = W ®@ We. Compare this exercise with Exercise 28. 


Let W, and W>, be subspaces of a vector space V. Prove that V is the 
direct sum of W, and W) if and only if each vector in V can be uniquely 
written as ©, + 2, where x; € W, and 22 € Wp. 


Let W be a subspace of a vector space V over a field F’. For any v € V 
the set {v}+W = {v+w: w € W} is called the coset of W containing 
v. It is customary to denote this coset by v + W rather than {uv} + W. 


(a) Prove that v + W is a subspace of V if and only if vu € W. 
(b) Prove that v1 + W = v2 + W if and only if v; — vg € W. 


Addition and scalar multiplication by scalars of F' can be defined in the 
collection S = {v + W: v € V} of all cosets of W as follows: 


(v1 + W) + (v2 + W) = (v1 + v2) + W 
for all v1, v2 € V and 
a(v + W) = av +W 


for allvu Ee Vandaec F. 


(c) Prove that the preceding operations are well defined; that is, show 
that if vy. + W =v, + W and v2 + W = v}, + W, then 


(uy + W) + (ve + W) = (vi + W) +4 (v5 + W) 


and 


a(v, +W) = a(v, + W) 


for alla € F. 

(d) Prove that the set S is a vector space with the operations defined in 
(c). This vector space is called the quotient space of V modulo 
W and is denoted by V/W. 
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1.4 LINEAR COMBINATIONS AND SYSTEMS OF LINEAR 
EQUATIONS 


In Section 1.1, it was shown that the equation of the plane through three 
noncollinear points A, B, and C in space is x = A+ su+tv, where u and 
v denote the vectors beginning at A and ending at B and C, respectively, 
and s and ¢ denote arbitrary real numbers. An important special case occurs 
when A is the origin. In this case, the equation of the plane simplifies to 
x = su+tv, and the set of all points in this plane is a subspace of R®. (This 
is proved as Theorem 1.5.) Expressions of the form su + tv, where s and t 
are scalars and u and v are vectors, play a central role in the theory of vector 
spaces. The appropriate generalization of such expressions is presented in the 
following definitions. 


Definitions. Let V be a vector space and S a nonempty subset of V. A 
vector v € V is called a linear combination of vectors of S if there exist 


a finite number of vectors u1,U2,...,Un in S and scalars a1, a2,...,@, in F 
such that v = ayu, + agqu2 +--:+GnUn. In this case we also say that v is 
a linear combination of u,,u2,...,Un and call a), a2,...,dn, the coefficients 


of the linear combination. 


Observe that in any vector space V, Ov = 0 for each v € V. Thus the zero 
vector is a linear combination of any nonempty subset of V. 


Example 1 
TABLE 1.1 Vitamin Content of 100 Grams of Certain Foods 
A By Bg Niacin Cc 
(units) (mg) (mg) (mg) __ (mg) 
Apple butter 0 0.01 0.02 0.2 2 
Raw, unpared apples (freshly harvested) 90 0.03 0.02 0.1 4 
Chocolate-coated candy with coconut 0 0.02 0.07 0.2 0 
center 

Clams (meat only) 100 0.10 0.18 1.3 10 
Cupcake from mix (dry form) 0 0.05 0.06 0.3 0 
Cooked farina (unenriched) (0)® 0.01 0.01 0.1 0) 
Jams and preserves 10 0.01 0.03 0.2 2 
Coconut custard pie (baked from mix) 0 0.02 0.02 0.4 0 
Raw brown rice (0) 0.34 =—0.05 4.7 0) 
Soy sauce 0 0.02 0.25 0.4 0 
Cooked spaghetti (unenriched) 0 0.01 0.01 0.3 0 
Raw wild rice (0) 045 063 62 0) 


Source: Bernice K. Watt and Annabel L. Merrill, Composition of Foods (Agriculture Hand- 
book Number 8), Consumer and Food Economics Research Division, U.S. Department of 
Agriculture, Washington, D.C., 1963. 


*Zeros in parentheses indicate that the amount of a vitamin present is either none or too 
small to measure. 
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Table 1.1 shows the vitamin content of 100 grams of 12 foods with respect to 
vitamins A, B; (thiamine), Bz (riboflavin), niacin, and C (ascorbic acid). 


The vitamin content of 100 grams of each food can be recorded as a column 
vector in R°—for example, the vitamin vector for apple butter is 


0.00 
0.01 
0.02 
0.20 
2.00 


Considering the vitamin vectors for cupcake, coconut custard pie, raw brown 
rice, soy sauce, and wild rice, we see that 


0.00 0.00 0.00 0.00 0.00 
0.05 0.02 0.34 0.02 0.45 
0.06 | + | 0.02 | + | 0.05 | +2] 0.25) = | 0.63 
0.30 0.40 4.70 0.40 6.20 
0.00 0.00 0.00 0.00 0.00 


Thus the vitamin vector for wild rice is a linear combination of the vitamin 
vectors for cupcake, coconut custard pie, raw brown rice, and soy sauce. So 
100 grams of cupcake, 100 grams of coconut custard pie, 100 grams of raw 
brown rice, and 200 grams of soy sauce provide exactly the same amounts of 
the five vitamins as 100 grams of raw wild rice. Similarly, since 


0.00 90.00 0.00 0.00 10.00 0.00 100.00 
0.01 0.03 0.02 0.01 0.01 0.01 0.10 

2] 0.02 }+ ] 0.02}+]0.07}+) 0.01 }+] 0.03 }+] 0.01 | = 0.18 | , 
0.20 0.10 0.20 0.10 0.20 0.30 1.30 
2.00 4.00 0.00 0.00 2.00 0.00 10.00 


200 grams of apple butter, 100 grams of apples, 100 grams of chocolate candy, 
100 grams of farina, 100 grams of jam, and 100 grams of spaghetti provide 
exactly the same amounts of the five vitamins as 100 grams of clams. 


Throughout Chapters 1 and 2 we encounter many different situations in 
which it is necessary to determine whether or not a vector can be expressed 
as a linear combination of other vectors, and if so, how. This question often 
reduces to the problem of solving a system of linear equations. In Chapter 3, 
we discuss a general method for using matrices to solve any system of linear 
equations. For now, we illustrate how to solve a system of linear equations by 
showing how to determine if the vector (2,6,8) can be expressed as a linear 
combination of 


U= (1,2, 1), U2 = (—2, —4, —2), U3 = (0, 2,3), 
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ua = (2,0,-3), and us = (—3,8, 16). 


Thus we must determine if there are scalars a), a2,a3,a4, and a5 such that 


(2,6, 8) = ayuy + aque + agug + agus + a5Us 

a1 (1,2, 1) +.a2(—2, —4, —2) + a9(0, 2,3) 

+ aa(2, 0, —3) + a5(—3, 8, 16) 
= (a1 — 2ag + 2a4 — 3a5, 2a; — 4a + 2a3 + 8as, 


ay — 2a2 + 3a3 — 3a4 + 16a5). 


Hence (2,6,8) can be expressed as a linear combination of u1, ug, u3, ua, and 
us if and only if there is a 5-tuple of scalars (a1, a2, a3, a4, a5) satisfying the 
system of linear equations 


ay, — 2a2 + 2a, —-— 3a5 = 2 
2a4 oer 4ag le 2a3 an 8a5 =6 (1) 
ay 2a2 3a3 3a4 16a5 = 8, 


which is obtained by equating the corresponding coordinates in the preceding 
equation. 

To solve system (1), we replace it by another system with the same solu- 
tions, but which is easier to solve. The procedure to be used expresses some 
of the unknowns in terms of others by eliminating certain unknowns from 
all the equations except one. To begin, we eliminate a; from every equation 
except the first by adding —2 times the first equation to the second and —1 
times the first equation to the third. The result is the following new system: 


a, — 2a +2a,—-— 3a5=2 
2a3 — 4a4 + 14a5 = 2 (2) 
3a3 — 5a4 + 19a5 = 6. 


In this case, it happened that while eliminating a, from every equation 
except the first, we also eliminated a2 from every equation except the first. 
This need not happen in general. We now want to make the coefficient of a3 in 
the second equation equal to 1, and then eliminate a3 from the third equation. 
To do this, we first multiply the second equation by 5, which produces 


a, — 2a + 2a, —-— 3a, =2 
a3 — 2a4+ Ta5=1 
3a3 — 5a4 + 19a5 = 6. 


Next we add —3 times the second equation to the third, obtaining 


a, — 2a9 + 2a4 — 3a5 = 2 
a3 — 2a4 + Tas = 1 (3) 
a4 — 2a5 = 3. 
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We continue by eliminating a, from every equation of (3) except the third. 
This yields 


ay — 2a2 Tr 4&2 = —4 
a3 +3a,= 7 (4) 
a4 — 25 =~ 233 


System (4) is a system of the desired form: It is easy to solve for the first 
unknown present in each of the equations (a1,a3, and a4) in terms of the 
other unknowns (a2 and as). Rewriting system (4) in this form, we find that 


ay= 2a2 — a — 4 
a3 = = 3a5 oa ie v4 
a4 = 2d5 +3. 


Thus for any choice of scalars az and as, a vector of the form 


(a1, A2, 43, 04,05) = (2a a5 4, a2, 3a5 7, 2a5 3, a5) 


is a solution to system (1). In particular, the vector (—4,0,7,3,0) obtained 
by setting a2 = 0 and as = 0 is a solution to (1). Therefore 


(2,6,8) = —4u, + Oug + Tug + 3u4 + Ous, 


so that (2,6,8) is a linear combination of wy, ug, ug, us, and Us. 
The procedure just illustrated uses three types of operations to simplify 
the original system: 


1. interchanging the order of any two equations in the system; 

2. multiplying any equation in the system by a nonzero constant; 

3. adding a constant multiple of any equation to another equation in the 
system. 


In Section 3.4, we prove that these operations do not change the set of 
solutions to the original system. Note that we employed these operations to 
obtain a system of equations that had the following properties: 


1. The first nonzero coefficient in each equation is one. 

2. If an unknown is the first unknown with a nonzero coefficient in some 
equation, then that unknown occurs with a zero coefficient in each of 
the other equations. 

3. The first unknown with a nonzero coefficient in any equation has a 
larger subscript than the first unknown with a nonzero coefficient in 
any preceding equation. 
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To help clarify the meaning of these properties, note that none of the 
following systems meets these requirements. 


r+ 322 + a= 7 
223 — 544 = —1 


(5) 


v1 222 + 323 +r @ = —95 
x3 = 225 = 9 (6) 
4 3X5, = 6 
Ly = 223 EE! eee 1 
v4 — 6x5 =0 (7) 
Lq + 5x3 — 345 = 2. 


Specifically, system (5) does not satisfy property 1 because the first nonzero 
coefficient in the second equation is 2; system (6) does not satisfy property 2 
because x3, the first unknown with a nonzero coefficient in the second equa- 
tion, occurs with a nonzero coefficient in the first equation; and system (7) 
does not satisfy property 3 because x2, the first unknown with a nonzero 
coefficient in the third equation, does not have a larger subscript than x4, the 
first unknown with a nonzero coefficient in the second equation. 

Once a system with properties 1, 2, and 3 has been obtained, it is easy 
to solve for some of the unknowns in terms of the others (as in the preceding 
example). If, however, in the course of using operations 1, 2, and 3 a system 
containing an equation of the form 0 = c, where c is nonzero, is obtained, 
then the original system has no solutions. (See Example 2.) 

We return to the study of systems of linear equations in Chapter 3. We 
discuss there the theoretical basis for this method of solving systems of linear 
equations and further simplify the procedure by use of matrices. 


Example 2 
We claim that 


2a? — 2x? +122 — 6 
is a linear combination of 
xg? —2e7-—5e—3 and 32°—5a22—4r-—9 
in P3(R), but that 
32° — 2x7 +748 
is not. In the first case we wish to find scalars a and b such that 


22° — Qn? + 122 — 6 = a(x? — 22? — 52 — 3) + b(32° — 5x? — 4 — 9) 
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= (a + 3b)x? + (—2a — 5b)x? + (—5a — 4b)ax + (—3a — 90). 


Thus we are led to the following system of linear equations: 


a+3b= 2 
—2a — 5b= —2 
—5a—4b= 12 
—3a — 9b= —6. 


Adding appropriate multiples of the first equation to the others in order to 
eliminate a, we find that 


a+ 3b= 2 
b= 2 
11b = 22 
0b= 0. 


Now adding the appropriate multiples of the second equation to the others 
yields 


a=-—4 
b= 2 
= 0 
0O= O 


Hence 


Qa? — 2a? 4+ 12¢ —~6 = —A(g? — 227 —5ae —3) + 2(32°— 5a? — 4a — 9). 


In the second case, we wish to show that there are no scalars a and 6b for 
which 


32° — 22? + 72 +8 = a(x® — 2a? — 5a — 3) + b(3a? — 5a? — 42 — 9). 


Using the preceding technique, we obtain a system of linear equations 


a+3b= 3 
—2a — 5b = —2 
—5a-—4b= 7 (8) 
—3a-—9b= 8. 
Eliminating a as before yields 
a+ 3b= 3 
b= 4 
11b = 22 
0 = 17. 
But the presence of the inconsistent equation 0 = 17 indicates that (8) 


has no solutions. Hence 3x? — 2x? + 7 + 8 is not a linear combination of 
x? — 2x? — 5a —3 and 32° — 52? — 47 — 9. 4 
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Throughout this book, we form the set of all linear combinations of some 
set of vectors. We now name such a set of linear combinations. 


Definition. Let S be a nonempty subset of a vector space V. The span 
of S', denoted span($), is the set consisting of all linear combinations of the 
vectors in S. For convenience, we define span(@) = {0}. 


In R3, for instance, the span of the set {(1,0,0), (0,1,0)} consists of all 
vectors in R® that have the form a(1,0,0) + 6(0,1,0) = (a,b,0) for some 
scalars a and b. Thus the span of {(1,0,0), (0, 1,0)} contains all the points in 
the xy-plane. In this case, the span of the set is a subspace of R?. This fact 
is true in general. 


Theorem 1.5. The span of any subset S' of a vector space V is a subspace 
of V. Moreover, any subspace of V that contains S must also contain the 
span of S. 


Proof. This result is immediate if S = @ because span(@) = {0}, which 
is a subspace that is contained in any subspace of V. 

If S # @, then S contains a vector z. So 0z = 0 is in span(S). Let 
x,y € span(S). Then there exist vectors ui, U2,...,Um, V1,V2,;---,Un in S 
and scalars a1, @2,..-,@m, 61, b2,...,0n such that 


B= a4,U, + agg +-+-+amUm and y= byvy + bovg +--+ + dyn. 


Then 


G+ Y = ayuy + aque +--+ GmUm + b1v1 + bgvg + +++ + bnvn 
and, for any scalar c, 
cx = (ca1)uy + (cag)ug + +++ + (Cam)Um 


are clearly linear combinations of the vectors in S; so «+ y and cx are in 
span(S). Thus span(S’) is a subspace of V. 

Now let W denote any subspace of V that contains S. If w € span($), then 
w has the form w = c,w,+cow2+---+cpw,z for some vectors wi, we,..., Wr in 
S and some scalars cj, c2,...,Ck. Since S C W, we have wy), we,...,we € W. 
Therefore w = cyw, + cow2 +--- + ceux is in W by Exercise 20 of Section 
1.3. Because w, an arbitrary vector in span(S), belongs to W, it follows that 
span($) C W. | 


Definition. A subset S of a vector space V generates (or spans) V 
if span(S) = V. In this case, we also say that the vectors of S generate (or 
span) V. 
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Example 3 


The vectors (1,1,0), (1,0,1), and (0,1, 1) generate R® since an arbitrary vector 
(a1,@2,a3) in R® is a linear combination of the three given vectors; in fact, 
the scalars r,s, and t for which 


eC; 1,0) te s(1,0, 1) a t(0, 1, 1) = (a1, a2, a3) 


are 


1 1 1 
r= 3 (41 + a2 — as), s= 3 (a1 — a2 + as), and t= 5 (a1 + a2 + a3). 4 


Example 4 


The polynomials x? + 3a — 2, 2a? + 5a — 3, and —x? — 4x2 + 4 generate P2(R) 
since each of the three given polynomials belongs to P2() and each polyno- 
mial ax? + ba + ¢ in P2(R) is a linear combination of these three, namely, 


(—8a + 5b + 3c)(x? + 3a — 2) + (4a — 2b — c)(2a? + 5a — 3) 


+-(—a+b+c)(—2? — 42 +4) = az? + br +e. 4 


Example 5 


The matrices 


(i a)» Goa)» Gr a). me Gy) 


generate Mo,2(R) since an arbitrary matrix A in Mox2(R) can be expressed 
as a linear combination of the four given matrices as follows: 


ay. 412 1 1 2 1 1 
ee =) = ( ajy1-4 3 12 t 3 721 3022) (; 7 
1 2 1 Le dl 

(san 32 3 221 3022) ({ i) 

2 1 1 1 0 

(sau 302 3 721 3022) (; ') 

1 1 1 0 1 

( 3711 312 3 221 3022) (; i): 


On the other hand, the matrices 


(0 1) (0 i) ™ G 4) 
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do not generate M2,.2(R) because each of these matrices has equal diagonal 
entries. So any linear combination of these matrices has equal diagonal en- 
tries. Hence not every 2 x 2 matrix is a linear combination of these three 
matrices. 


At the beginning of this section we noted that the equation of a plane 
through three noncollinear points in space, one of which is the origin, is of 
the form « = su+tv, where u,v € R® and s and ¢ are scalars. Thus x € R? is 
a linear combination of u,v € R® if and only if x lies in the plane containing 
u and v. (See Figure 1.5.) 


Figure 1.5 


Usually there are many different subsets that generate a subspace W. (See 
Exercise 13.) It is natural to seek a subset of W that generates W and is as 
small as possible. In the next section we explore the circumstances under 
which a vector can be removed from a generating set to obtain a smaller 
generating set. 


EXERCISES 


1. Label the following statements as true or false. 


(a) The zero vector is a linear combination of any nonempty set of 
vectors. 

(b) The span of 2 is @. 

(c) If S is a subset of a vector space V, then span(S) equals the inter- 
section of all subspaces of V that contain S. 

(d) In solving a system of linear equations, it is permissible to multiply 
an equation by any constant. 

(e) In solving a system of linear equations, it is permissible to add any 
multiple of one equation to another. 

(f) Every system of linear equations has a solution. 
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2. Solve the following systems 
duced in this section. 
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of linear equations by the method intro- 


224 ame 222 = 323 =-2 
(a) 32, — 3x9 —2¢@3+52%,4= 7 
Ly rq — 2x3 r= 3 
321 = 7x2 se 4x3 = 10 
(b) v1 222 Tr 3 = 3 
201 —- rq — 273 = 6 
r+ 2x2 x3 v4 =95 
(c) 2, + 4a. — 343 — 324 =6 
221 329 X3 4x4 —4 8 
Ly 222 2x3 = 2 
(d) «1 8x3 + 5x44 = —6 
XY v2 5x3 0xv4 = 3 
Ly 2X9 4x3 tat t= 7 
(e) Ly + 1023 3x24 — 4x45 = —16 
221 5x2 523 4x4 wh = 2 
4x4 11x 7x3 10%4 2X5, = 7 
Ly 222 6x3 =-1 
221 v2 3 = 8 
(f) 321 +r &2— L3 = 15 
Ly 322 10x3 = 

3. For each of the following lists of vectors in R’, determine whether the 
first vector can be expressed as a linear combination of the other two. 
(a) (—2,0,3), (1,3, 0), (2,4, —-1) 

(b) a 2, —3), (3, 2, 1), (2, —l, =i) 
(c) (3,451), (1, =2,.1),.2, 1,1) 
(d) (2, —l, 0), es 2, —3), (1, —3, 2) 

(e) (5,1,—5), (1, -2, -3), (-2, 3, —4) 
(f) (—2,2,2), (1,2, —1), (—3, —3, 3) 

4. For each list of polynomials in P3(R), determine whether the first poly- 
nomial can be expressed as a linear combination of the other two. 
(a) 2? —324+5,23 +207 241,23 + 327-1 
(b) 423 + 22? — 6,23 — 227 + 4a +: 1,329 — 62? +244 
(c) —2x° — 11a? +324 2,23 — 27? + 32 —1,203 +27 +32 -2 
(d) 22+ 27+ 22+ 13,223 — 327 +42 +1,23 — 2? +2243 
(e) 2? — 822 + 42,23 — 227 + 32 —1,22 —22 +3 
(f) 62°? —32?+24+2,23 —27? +274 3,223 -3r4+1 
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5. In each part, determine whether the given vector is in the span of S. 


(a) (2,-1,1), S = {(1,0,2),(-1,1,1)} 


(c) (1,412); § = {(1.0,1,—1),(0,1,1,1)} 

(d) (2,=1,1,=3); S= {(1,0,1, —1), (0,1,1,1)} 

(e) —2? + 2x? + 32 4+ 3, crane Rn a 
(f) 223-2? +2743, S={eit+e?t+aetl ate z+1} 


So 


© (EE 909.69 
OMe een ECan), 


6. Show that the vectors (1,1,0), (1,0,1), and (0,1,1) generate F*. 


7. In F”, let e; denote the vector whose jth coordinate is 1 and whose 
other coordinates are 0. Prove that {e1,e2,...,e€n} generates F”. 


8. Show that P,,(/’) is generated by {1,2,...,2”}. 


9. Show that the matrices 


(0 0) oo) Ga) G4) 


generate Mox2(F’). 


10. Show that if 


1 0 0 0 0 1 
m= (4 ye Ma = (9 is and Ma =({ ae 


then the span of {4,, Mo, M3} is the set of all symmetric 2 x 2 matrices. 


11. Prove that span({x}) = {ax: a € F} for any vector z in a vector space. 
Interpret this result geometrically in R®. 


12. Show that a subset W of a vector space V is a subspace of V if and only 
if span(W) = W. 


13.1 Show that if S, and $2 are subsets of a vector space V such that S; C S%, 
then span($1) C span(S2). In particular, if $1 C S_ and span($1) = V 
deduce that span(S2) = 


14. Show that if S, and S>2 are arbitrary subsets of a vector space V, then 
span(S,US_) = span(.S;)+span(S2). (The sum of two subsets is defined 
in the exercises of Section 1.3.) 
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15. Let S, and S2 be subsets of a vector space V. Prove that span(.$,.$2) C 
span(S ,) M span(S2). Give an example in which span(S; 9 S2) and 
span(.S) M span($'2) are equal and one in which they are unequal. 


16. Let V be a vector space and S$ a subset of V with the property that 
whenever U1,V2,.--,Un € S and ayvy + agvg +--+ + ann = O, then 
a, = ag =--: =a, = O. Prove that every vector in the span of S can 
be uniquely written as a linear combination of vectors of S. 


17. Let W be a subspace of a vector space V. Under what conditions are 
there only a finite number of distinct subsets S of W such that S gen- 
erates W? 


1.5. LINEAR DEPENDENCE AND LINEAR INDEPENDENCE 


Suppose that V is a vector space over an infinite field and that W is a subspace 
of V. Unless W is the zero subspace, W is an infinite set. It is desirable to 
find a “small” finite subset S' that generates W because we can then describe 
each vector in W as a linear combination of the finite number of vectors in 
S. Indeed, the smaller that S' is, the fewer computations that are required 
to represent vectors in W. Consider, for example, the subspace W of R® 
generated by S = {u1,U2,us, ua}, where uw; = (2,—1,4), ue = (1,-1,3), 
ug = (1,1,-1), and uy = (1, —2,—1). Let us attempt to find a proper subset 
of S that also generates W. The search for this subset is related to the 
question of whether or not some vector in S' is a linear combination of the 
other vectors in S. Now ug is a linear combination of the other vectors in S$ 
if and only if there are scalars a,, a2, and a3 such that 


U4 = A,U, + AQU2 + a3U3, 


that is, if and only if there are scalars a), a2, and a3 satisfying 


(1, -2, -1) = (2a, + ag + a3, —@y, a2 4 az, 4a, + 3a — az). 


Thus wz is a linear combination of u1,u2, and ug if and only if the system of 
linear equations 


2a, ag a3 = 1 
ay ag a3 = 2 
4a, Tr 3a2 — 23 = —1 
has a solution. The reader should verify that no such solution exists. This 


does not, however, answer our question of whether some vector in S'is a linear 
combination of the other vectors in S. It can be shown, in fact, that u3 is a 
linear combination of ui, u2, and u4, namely, u3 = 2u; — 3u2 + Ou. 
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In the preceding example, checking that some vector in S$ is a linear 
combination of the other vectors in S could require that we solve several 
different systems of linear equations before we determine which, if any, of 
U1, U2,U3, and uy, is a linear combination of the others. By formulating 
our question differently, we can save ourselves some work. Note that since 
ug = 2u, — 3u2 + Ou4, we have 


2u, + 3u2 + ug — Ou4 = 0. 


That is, because some vector in S is a linear combination of the others, the 
zero vector can be expressed as a linear combination of the vectors in S using 
coefficients that are not all zero. The converse of this statement is also true: 
If the zero vector can be written as a linear combination of the vectors in S 
in which not all the coefficients are zero, then some vector in S is a linear 
combination of the others. For instance, in the example above, the equation 

2u, + 3u2 + ug — Ous, = O can be solved for any vector having a nonzero 
coefficient; so u,, U2, or ug (but not w4) can be written as a linear combination 
of the other three vectors. Thus, rather than asking whether some vector in 
S is a linear combination of the other vectors in S, it is more efficient to 
ask whether the zero vector can be expressed as a linear combination of the 
vectors in S with coefficients that are not all zero. This observation leads us 
to the following definition. 


Definition. A subset S of a vector space V is called linearly dependent 
if there exist a finite number of distinct vectors u1,U2,...,Un in S and scalars 
@1,09,---,@,, not all zero, such that 


a,U, + AgUg +++: + anUn = 0. 
In this case we also say that the vectors of S are linearly dependent. 


For any vectors u,,U2,...,Un, we have aju, + agug +++: + anUn = 0 
if ay = ag =--- =a, = 0. We call this the trivial representation of 0 as 
a linear combination of u,,ug,...,U,- Thus, for a set to be linearly depen- 
dent, there must exist a nontrivial representation of 0 as a linear combination 
of vectors in the set. Consequently, any subset of a vector space that con- 
tains the zero vector is linearly dependent, because 0 = 1-0 is a nontrivial 
representation of 0 as a linear combination of vectors in the set. 


Example 1 
Consider the set 


S = {(1,3, —4, 2), (2,2, —4,0), (1, -3, 2, —4), (—1,0,1,0)} 


in R*. We show that S$ is linearly dependent and then express one of the 
vectors in S as a linear combination of the other vectors in S. To show that 
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S is linearly dependent, we must find scalars a1, a2, a3, and a4, not all zero, 
such that 


G13, 42) 4eG9(0 2) -4.0)-+ asl 3, 94) 4 a(S 0A, 0 0: 


Finding such scalars amounts to finding a nonzero solution to the system of 
linear equations 


ay 2a a3 a4 = 0 
3a1 Ale 2a2 = 3a3 =0 
4a, 4ag 2a3 + a4 = 
2a1 = 4a3 = 0. 
One such solution is a, = 4, ag = —3, ag = 2, and ag = 0. Thus S isa 


linearly dependent subset of R+, and 


A(1,3, —4, 2) — 3(2,2, 4,0) + 2(1,-3,2,—4) + 0(-1,0,1,0)=0. @ 


Example 2 
In Mox3(R), the set 


(Ce he ee rea) 


is linearly dependent because 
1 -3 2 —3 7 4 —2 3 11 0 0 0 
5(_3 0 5) +3 a Al a5 = (6 0 5) 4 


Definition. A subset S of a vector space that is not linearly dependent 
is called linearly independent. As before, we also say that the vectors of 
S are linearly independent. 


The following facts about linearly independent sets are true in any vector 
space. 


1. The empty set is linearly independent, for linearly dependent sets must 
be nonempty. 

2. A set consisting of a single nonzero vector is linearly independent. For 
if {u} is linearly dependent, then au = 0 for some nonzero scalar a. 
Thus 


u=a‘(au)=a'0=0. 


3. A set is linearly independent if and only if the only representations of 
0 as linear combinations of its vectors are trivial representations. 
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The condition in item 3 provides a useful method for determining whether 
a finite set is linearly independent. This technique is illustrated in the exam- 
ples that follow. 
Example 3 
To prove that the set 


S = {(1,0,0,-1), (0, 1,0, —1), (0,0, 1, —1), (0,0,0,1)} 


is linearly independent, we must show that the only linear combination of 
vectors in S that equals the zero vector is the one in which all the coefficients 
are zero. Suppose that a1, a2,a3, and a4 are scalars such that 


a,(1, 0, 0, =) ar a2(0, 1, 0, —1) a a3(0,0, 1, —1) + a4(0,0, 0, 1) = (0, 0, 0, 0). 


Equating the corresponding coordinates of the vectors on the left and the right 
sides of this equation, we obtain the following system of linear equations. 


ay =0 
a2 = 

a3 =0 

—a, — a2 —a3z +a4=0 


Clearly the only solution to this system is a, = a2 = a3 = a4 = 0, and so $ 
is linearly independent. 


Example 4 
For k =0,1,...,n let py(x) = a2* + a*+14....4 2". The set 


{po(x),Pi(@), ++ Pn (a) } 
is linearly independent in P,,(F’). For if 
agpo(@) + arpi(@) + +++ + AnPn(x) = 0 


for some scalars ag, @1,...,@n, then 


Pal dod 
T 


ag + (ao t ay)x t (ao + ay + a2) T (ao ee An)2” = 0. 


By equating the coefficients of «* on both sides of this equation for k = 
1,2,...,n, we obtain 

ao =0 

ago + at => 


ao ay, a2 = 


ag +a, +ag+-:-+ an =0. 


Clearly the only solution to this system of linear equations is dg = a, =--- = 
a,=0 
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The following important results are immediate consequences of the defi- 
nitions of linear dependence and linear independence. 


Theorem 1.6. Let V be a vector space, and let S; C Sg CV. If S1 is 
linearly dependent, then S2 is linearly dependent. 


Proof. Exercise. | 


Corollary. Let V be a vector space, and let S,; C Sz CV. If S2 is linearly 
independent, then S; is linearly independent. 


Proof. Exercise. | 


Earlier in this section, we remarked that the issue of whether S is the 
smallest generating set for its span is related to the question of whether 
some vector in S is a linear combination of the other vectors in S$. Thus 
the issue of whether S' is the smallest generating set for its span is related 
to the question of whether S' is linearly dependent. To see why, consider 
the subset S = {u, u2, U3, ua} of R°, where uy = (2,—-1,4), ua = (1,-1,3), 
ug = (1,1,-1), and us = (1,—2,-1). We have previously noted that S is 
linearly dependent; in fact, 


2ui t 3u2 t U3 Ou, = 0. 


This equation implies that ug (or alternatively, uw; or u2) is a linear combina- 
tion of the other vectors in S. For example, u3 = 2u, — 3u2 +O0u,4. Therefore 
every linear combination au, + agu2 + a3u3 + au, of vectors in S can be 
written as a linear combination of u,,u2, and w4: 


a ,U, + AQU2 + A3U3 + A4U4 = A1U, + AQU2 + a3(2uy == 3u2 + Ow.) + a4Uug 


= (a; + 2a3)u1 + (a2 — 3a3)u2 + agua. 


Thus the subset S’ = {u1, u2, us} of S has the same span as S! 

More generally, suppose that S is any linearly dependent set containing 
two or more vectors. Then some vector v € S' can be written as a linear 
combination of the other vectors in S, and the subset obtained by removing 
v from $ has the same span as S. It follows that if no proper subset of S 
generates the span of S, then S must be linearly independent. Another way 
to view the preceding statement is given in Theorem 1.7. 


Theorem 1.7. Let S be a linearly independent subset of a vector space 
V, and let v be a vector in V that is not in S. Then SU {v} is linearly 
dependent if and only if v € span(S). 
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Proof. If SU{v} is linearly dependent, then there are vectors u1,U2,...,Un 
in SU {v} such that ayu, + agua +--+: + Gyn = 0 for some nonzero scalars 
@1,92,-.-,@n. Because S' is linearly independent, one of the u,’s, say uy, 
equals v. Thus ayv + agug +---+an,U, = 0, and so 


v= a,‘ (—ague +++ GyUn) = —(az*a2)uz see (ay ‘an)Un. 


Since v is a linear combination of u2,...,U,, which are in S, we have v € 
span(S). 

Conversely, let v € span(.S). Then there exist vectors 11, V2,...,Um in S$ 
and scalars 01, b2,..., 0m such that v = by v1 + bevo +---+bmUm. Hence 


0 = byv1 t bave fires + bmUm t ( l)v. 


Since v 4 vu; for i= 1,2,...,m, the coefficient of v in this linear combination 
is nonzero, and so the set {v1,v2,..-,Um,v} is linearly dependent. Therefore 
SU {v} is linearly dependent by Theorem 1.6. | 


Linearly independent generating sets are investigated in detail in Sec- 
tion 1.6. 


EXERCISES 


1. Label the following statements as true or false. 


(a) If S is a linearly dependent set, then each vector in S' is a linear 
combination of other vectors in S. 

(b) Any set containing the zero vector is linearly dependent. 

(c) The empty set is linearly dependent. 

(d) Subsets of linearly dependent sets are linearly dependent. 

(e) Subsets of linearly independent sets are linearly independent. 

(f) If ayay + aga + +++ + An%p, = O and x1,2%2,...,%n are linearly 
independent, then all the scalars a; are zero. 


2.° Determine whether the following sets are linearly dependent or linearly 
independent. 


(4 -9-Ch Damen 
(LG Dp 


(c) {a° 4 2x?,-a2? +3x+1,23 — x? + 2x — 1} in P3(R) 


3The computations in Exercise 2(g), (h), (i), and (j) are tedious unless technology is 
used. 
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(d) {x3 — 2,22 
(e) {(1,-1,2 
(f) {(1,-1,2 


© (49:0 ).C 3G Dpermom 
© (29-6 96 9G pam 


(i) {z+ — 2° + 5a? — 824+ 6,—24+ 23 — 52? + 5a — 3, 
+ 4a? —2+1,23 —2+2} in Py(R) 
(ij) {a+ — 2? +52? — 82 + 6, —2* + 23 — 5a? + 52 — 3, 
ere: 


244,203 + 3x? + 22 + 6} in P3(R) 
), (1, -2, 1), (1,1, 4)} in R® 
), (2,0, 1), (—1, 2, -1)} in R? 


>) ? 


3. In Mox3(F), prove that the set 
1 1 0 0 0 0 1 0 0 1 
O O},{1 1],10 O],];1 OF ,7;0 1 
0 0 0 0 1 1 1 0 0 1 
is linearly dependent. 


4. InF”, let e; denote the vector whose jth coordinate is 1 and whose other 
coordinates are 0. Prove that {e1, €2,---,@n} is linearly independent. 


5. Show that the set {1,z,2?,...,a”} is linearly independent in P,(F). 
6. InMnyn(F), let EY denote the matrix whose only nonzero entry is 1 in 


the ith row and jth column. Prove that {E4%:1<i<m,1<j<n} 
is linearly independent. 


=" 


Recall from Example 3 in Section 1.3 that the set of diagonal matrices in 
M2x2(F’) is a subspace. Find a linearly independent set that generates 
this subspace. 


8. Let S = {(1,1,0), (1,0,1), (0,1, 1)} be a subset of the vector space F°. 
(a) Prove that if F = R, then S is linearly independent. 
(b) Prove that if F' has characteristic 2, then S is linearly dependent. 


9.' Let u and v be distinct vectors in a vector space V. Show that {u,v} is 
linearly dependent if and only if u or v is a multiple of the other. 


10. Give an example of three linearly dependent vectors in R® such that 
none of the three is a multiple of another. 
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11. 


12. 
13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


1.6 
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Let S = {uy,u2,...,Un} be a linearly independent subset of a vector 
space V over the field Z,. How many vectors are there in span(S)? 
Justify your answer. 


Prove Theorem 1.6 and its corollary. 


Let V be a vector space over a field of characteristic not equal to two. 


(a) Let u and v be distinct vectors in V. Prove that {u,v} is linearly 
independent if and only if {u+v,u— v} is linearly independent. 

(b) Let u, v, and w be distinct vectors in V. Prove that {u,v,w} is 
linearly independent if and only if {u+v,u+w,v+ wh} is linearly 
independent. 


Prove that a set S is linearly dependent if and only if S = {0} or 
there exist distinct vectors v, ui, U2,..-,Un in S such that v is a linear 
combination of uz, u2,...,Un.- 


Let S = {u1,u2,...,Un} be a finite set of vectors. Prove that S' is 
linearly dependent if and only if u1 = 0 or ug4i € span({u1, u2,..., ue}) 
for some k (l1<k <n). 


Prove that a set S of vectors is linearly independent if and only if each 
finite subset of S is linearly independent. 


Let M be a square upper triangular matrix (as defined in Exercise 12 
of Section 1.3) with nonzero diagonal entries. Prove that the columns 
of M are linearly independent. 


Let S be a set of nonzero polynomials in P(£’) such that no two have 
the same degree. Prove that S$ is linearly independent. 


Prove that if {A,,A2,...,A,} is a linearly independent subset of 
Mnxn(F), then {Aj, A$,... , Aj} is also linearly independent. 


Let f,g,€ F(R, R) be the functions defined by f(t) = e”™ and g(t) = e*, 
where r # s. Prove that f and g are linearly independent in F(R, R). 


BASES AND DIMENSION 


We saw in Section 1.5 that if S is a generating set for a subspace W and 
no proper subset of S is a generating set for W, then S must be linearly 
independent. A linearly independent generating set for W possesses a very 
useful property—every vector in W can be expressed in one and only one way 
as a linear combination of the vectors in the set. (This property is proved 
below in Theorem 1.8.) It is this property that makes linearly independent 
generating sets the building blocks of vector spaces. 
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Definition. A basis 3 for a vector space V is a linearly independent 
subset of V that generates V. If 3 is a basis for V, we also say that the 
vectors of 3 form a basis for V. 


Example 1 


Recalling that span(@) = {0} and @ is linearly independent, we see that @ 
is a basis for the zero vector space. 


Example 2 
In F®, let e; = (1,0,0,...,0),e2 = (0,1,0,...,0),...,€n = (0,0,...,0,1); 
{e1, €2,...,€n} is readily seen to be a basis for F” and is called the standard 


basis for F”. ¢ 


Example 3 

In Mrnxn(F), let E’? denote the matrix whose only nonzero entry is a 1 in 
the ith row and jth column. Then {E%: 1 <i<m,1<j <n} isa basis for 
Minxn(F). ¢ 


Example 4 


In P,(F) the set {1,2,27,...,2"} is a basis. We call this basis the standard 
basis for P,(F). 


Example 5 
In P(F) the set {1,2,27,...} isa basis. 


Observe that Example 5 shows that a basis need not be finite. In fact, 
later in this section it is shown that no basis for P(F’) can be finite. Hence 
not every vector space has a finite basis. 

The next theorem, which is used frequently in Chapter 2, establishes the 
most significant property of a basis. 


Theorem 1.8. Let V be a vector space and 3 = {u1,U2,...,Un} bea 
subset of V. Then £ is a basis for V if and only if each v € V can be uniquely 
expressed as a linear combination of vectors of 3, that is, can be expressed in 
the form 


VU = aU, + AQU2 +++: + AnUn 
for unique scalars a1, d2,...,Qn- 


Proof. Let 3 be a basis for V. If vu € V, then v € span() because 
span(3) = V. Thus v is a linear combination of the vectors of @. Suppose 
that 


V=a,U, t+ aqugt::++anUn and v=b,u, 4+ boueg +---+ dnUn 
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are two such representations of v. Subtracting the second equation from the 
first gives 


0 = (a1 by )ur t (a2 bg)u2 t t (dn bn )Un 
Since @ is linearly independent, it follows that a, — 6} = ag —b2 =-+:: = 
Gn — by, = 0. Hence a, = b1,a2 = bo,---,an = bn, and so v is uniquely 
expressible as a linear combination of the vectors of /. 
The proof of the converse is an exercise. | 
Theorem 1.8 shows that if the vectors u1,u2,...,Un form a basis for a 


vector space V, then every vector in V can be uniquely expressed in the form 
Vv = QyU, +. aQUg +++ + AnUn 


for appropriately chosen scalars a1,d@2,...,@,. Thus v determines a unique 
n-tuple of scalars (a1, d2,...,@n,) and, conversely, each n-tuple of scalars de- 
termines a unique vector v € V by using the entries of the n-tuple as the 
coefficients of a linear combination of u1,u2,...,Un. This fact suggests that 
V is like the vector space F”, where n is the number of vectors in the basis 
for V. We see in Section 2.4 that this is indeed the case. 

In this book, we are primarily interested in vector spaces having finite 
bases. Theorem 1.9 identifies a large class of vector spaces of this type. 


Theorem 1.9. If a vector space V is generated by a finite set S, then 
some subset of S is a basis for V. Hence V has a finite basis. 


Proof. If S = @ or S = {0}, then V = {0} and @ is a subset of S' that isa 
basis for V. Otherwise S contains a nonzero vector u;. By item 2 on page 37, 
{u;} is a linearly independent set. Continue, if possible, choosing vectors 
Ug,..-,Ux in S such that {u1, u2,..., uz} is linearly independent. Since S' is 
a finite set, we must eventually reach a stage at which 3 = {u1,u2,..., ug} is 
a linearly independent subset of S, but adjoining to @ any vector in S' not in G 
produces a linearly dependent set. We claim that @ is a basis for V. Because 
G is linearly independent by construction, it suffices to show that @ spans V. 
By Theorem 1.5 (p. 30) we need to show that S C span(3). Let v € S. If 
v © PB, then clearly v € span(@). Otherwise, if v ¢ 3, then the preceding 
construction shows that 6 U {v} is linearly dependent. So v € span(3) by 
Theorem 1.7 (p. 39). Thus S$ C span({). 


Because of the method by which the basis G was obtained in the proof 
of Theorem 1.9, this theorem is often remembered as saying that a finite 
spanning set for V can be reduced to a basis for V. This method is illustrated 
in the next example. 
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Example 6 
Let 


S = {(2, —3, 5), (8, —12, 20), (1,0, —2), (0, 2, —1), (7, 2, 0)}. 


It can be shown that S generates R?. We can select a basis for R? that 
is a subset of S by the technique used in proving Theorem 1.9. To start, 
select any nonzero vector in S, say (2,—3,5), to be a vector in the basis. 
Since 4(2,—3,5) = (8,—12,20), the set {(2,3, —5), (8,—12,20)} is linearly 
dependent by Exercise 9 of Section 1.5. Hence we do not include (8, —12, 20) 
in our basis. On the other hand, (1,0, —2) is not a multiple of (2, —3,5) and 
vice versa, so that the set {(2, —3,5), (1,0, —2)} is linearly independent. Thus 
we include (1,0,—2) as part of our basis. 


Now we consider the set {(2, —3,5), (1,0, —2), (0,2,—1)} obtained by ad- 
joining another vector in S' to the two vectors that we have already included 
in our basis. As before, we include (0,2,—1) in our basis or exclude it from 
the basis according to whether {(2, —3, 5), (1,0, —2), (0,2, —1)} is linearly in- 
dependent or linearly dependent. An easy calculation shows that this set is 
linearly independent, and so we include (0,2,—1) in our basis. In a similar 
fashion the final vector in S' is included or excluded from our basis according 
to whether the set 


{(2, —3, 5), (1,0, —2), (0, 2, —1), (7,2, 0)} 


is linearly independent or linearly dependent. Because 


2(2, —3,5) + 3(1,0, —2) + 4(0, 2, -1) — (7, 2,0) = (0,0,0), 
we exclude (7, 2,0) from our basis. We conclude that 
{(2, —3, 5), (1,0, —2), (0,2, —1)} 
is a subset of S that is a basis for R°. @ 


The corollaries of the following theorem are perhaps the most significant 
results in Chapter 1. 


Theorem 1.10 (Replacement Theorem). Let V be a vector space 
that is generated by a set G containing exactly n vectors, and let L be a 
linearly independent subset of V containing exactly m vectors. Then m <n 
and there exists a subset H of G containing exactly n — m vectors such that 
LUH generates V. 


Proof. The proof is by mathematical induction on m. The induction begins 
with m = 0; for in this case L = @, and so taking H = G gives the desired 
result. 
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Now suppose that the theorem is true for some integer m > 0. We prove 
that the theorem is true for m+1. Let L = {v1,v2,...,Um+4i} be a linearly 
independent subset of V consisting of m+ 1 vectors. By the corollary to 
Theorem 1.6 (p. 39), {v1,v2,--.,Um} is linearly independent, and so we may 
apply the induction hypothesis to conclude that m < n and that there is a 
subset {u1, U2,..-,;Un—m} of G such that {v1, v2,...,Um}U{ui1, ua,-.-,Un—m} 
generates V. Thus there exist scalars a1, @2,...,@m, 61, b2,...,bn—m such that 


QV, + A2Qv2 + +++ + GmUm + biu1 + boue + +++ + bn—mtn—-m =Um+1- (9) 


Note that n —m > 0, lest vm+1 be a linear combination of v1, v2,...,Um, 
which by Theorem 1.7 (p. 39) contradicts the assumption that L is linearly 
independent. Hence n > m; that is, n > m-+1. Moreover, some b;, say by, is 
nonzero, for otherwise we obtain the same contradiction. Solving (9) for wy 
gives 


Uy = (—by'a1)v1 aR (—by'a2)v2 apie (—by am)Um + (Br tees 
+ (—by *be)u2 +++ + (—bybn—m)Un—m- 


Let H = {ug,...,Un—m}. Then uw; € span(LUH), and because v1, v2,..-,Um, 
U2,---,Un—m are Clearly in span(L U H), it follows that 


{v1, 02,.-., Um, U1, U2,---,;Un—m} C span(LU A). 


Because {v1,V2,---,;Um; U1, U2,---;Un—m} generates V, Theorem 1.5 (p. 30) 
implies that span(L UH) = V. Since H is a subset of G that contains 
(n —m) —1 = n—-(m-+1) vectors, the theorem is true for m+ 1. This 
completes the induction. i 


Corollary 1. Let V be a vector space having a finite basis. Then every 
basis for V contains the same number of vectors. 


Proof. Suppose that / is a finite basis for V that contains exactly n vectors, 
and let y be any other basis for V. If y contains more than n vectors, then 
we can select a subset S of y containing exactly n+ 1 vectors. Since S is 
linearly independent and ( generates V, the replacement theorem implies that 
n+1<n,acontradiction. Therefore ¥ is finite, and the number m of vectors 
in ¥ satisfies m <n. Reversing the roles of 3 and y and arguing as above, we 
obtain n <_m. Hence m = n. E 


If a vector space has a finite basis, Corollary 1 asserts that the number 
of vectors in any basis for V is an intrinsic property of V. This fact makes 
possible the following important definitions. 


Definitions. A vector space is called finite-dimensional if it has a 
basis consisting of a finite number of vectors. The unique number of vectors 
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in each basis for V is called the dimension of V and is denoted by dim(V). 
A vector space that is not finite-dimensional is called infinite-dimensional. 


The following results are consequences of Examples 1 through 4. 


Example 7 


The vector space {0} has dimension zero. 


Example 8 


The vector space F” has dimensionn. 


Example 9 


The vector space Myxn(F) has dimension mn. 


Example 10 
The vector space P,,(F’) has dimensionn+1. 


The following examples show that the dimension of a vector space depends 
on its field of scalars. 


Example 11 


Over the field of complex numbers, the vector space of complex numbers has 
dimension 1. (A basis is {1}.) 


Example 12 


Over the field of real numbers, the vector space of complex numbers has 
dimension 2. (A basis is {1,7}.) 


In the terminology of dimension, the first conclusion in the replacement 
theorem states that if V is a finite-dimensional vector space, then no linearly 
independent subset of V can contain more than dim(V) vectors. From this 
fact it follows that the vector space P(F’) is infinite-dimensional because it 
has an infinite linearly independent set, namely {1,z,x?,...}. This set is, 
in fact, a basis for P(F’). Yet nothing that we have proved in this section 
guarantees an infinite-dimensional vector space must have a basis. In Section 
1.7 it is shown, however, that every vector space has a basis. 

Just as no linearly independent subset of a finite-dimensional vector space 
V can contain more than dim(V) vectors, a corresponding statement can be 
made about the size of a generating set. 


Corollary 2. Let V be a vector space with dimension n. 
(a) Any finite generating set for V contains at least n vectors, and a gener- 
ating set for V that contains exactly n vectors is a basis for V. 
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(b) Any linearly independent subset of V that contains exactly n vectors is 
a basis for V. 


(c) Every linearly independent subset of V can be extended to a basis for 
V. 


Proof. Let 3 be a basis for V. 

(a) Let G be a finite generating set for V. By Theorem 1.9 some subset H 
of G is a basis for V. Corollary 1 implies that H contains exactly n vectors. 
Since a subset of G contains n vectors, G must contain at least n vectors. 
Moreover, if G contains exactly n vectors, then we must have H = G, so that 
G is a basis for V. 

(b) Let L be a linearly independent subset of V containing exactly n 
vectors. It follows from the replacement theorem that there is a subset H of 
@ containing n — n = 0 vectors such that LU H generates V. Thus H = ©, 
and L generates V. Since L is also linearly independent, L is a basis for V. 

(c) If L is a linearly independent subset of V containing m vectors, then 
the replacement theorem asserts that there is a subset H of @ containing 
exactly n — m vectors such that LU H generates V. Now LU H contains at 
most n vectors; therefore (a) implies that LU H contains exactly n vectors 
and that LU Z is a basis for V. B 


Example 13 
It follows from Example 4 of Section 1.4 and (a) of Corollary 2 that 


{x? + 382 — 2,22? + 52 —3,-a27 —4a + 4} 


is a basis for Po(R). ¢ 


Example 14 
It follows from Example 5 of Section 1.4 and (a) of Corollary 2 that 


1 1 1 1 1 O 0 1 
1 O/’\0O IT/’\1 T/’\1 1 
is a basis for Moyx2(R). 


Example 15 
It follows from Example 3 of Section 1.5 and (b) of Corollary 2 that 


er 0, 0, =), (0, 1,0, eh) (0, 0, L, =1), (0, 0, 0, 1)} 


isa basis for R*. 
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Example 16 


Fork =0,1,...,n, let py(x) = a*¥ +2*t14+.--+2”. It follows from Example 4 
of Section 1.5 and (b) of Corollary 2 that 


{po(x), pi(2),--+;Pn(x)} 


isa basis for P,(F). 


A procedure for reducing a generating set to a basis was illustrated in 
Example 6. In Section 3.4, when we have learned more about solving systems 
of linear equations, we will discover a simpler method for reducing a gener- 
ating set to a basis. This procedure also can be used to extend a linearly 
independent set to a basis, as (c) of Corollary 2 asserts is possible. 


An Overview of Dimension and Its Consequences 


Theorem 1.9 as well as the replacement theorem and its corollaries contain 
a wealth of information about the relationships among linearly independent 
sets, bases, and generating sets. For this reason, we summarize here the main 
results of this section in order to put them into better perspective. 


A basis for a vector space V is a linearly independent subset of V that 
generates V. If V has a finite basis, then every basis for V contains the same 
number of vectors. This number is called the dimension of V, and V is said 
to be finite-dimensional. Thus if the dimension of V is n, every basis for V 
contains exactly n vectors. Moreover, every linearly independent subset of 
V contains no more than n vectors and can be extended to a basis for V 
by including appropriately chosen vectors. Also, each generating set for V 
contains at least n vectors and can be reduced to a basis for V by excluding 
appropriately chosen vectors. The Venn diagram in Figure 1.6 depicts these 
relationships. 


Linearly 
independent 
sets 


Generating 


sets 


Figure 1.6 
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The Dimension of Subspaces 


Our next result relates the dimension of a subspace to the dimension of 
the vector space that contains it. 


Theorem 1.11. Let W be a subspace of a finite-dimensional vector space 
V. Then W is finite-dimensional and dim(W) < dim(V). Moreover, if 
dim(W) = dim(V), then V = W. 

Proof. Let dim(V) = n. If W = {0}, then W is finite-dimensional and 
dim(W) = 0 < n. Otherwise, W contains a nonzero vector x1; so {11} is a 
linearly independent set. Continue choosing vectors, %1,22,...,2% in W such 
that {x1,22,...,2,} is linearly independent. Since no linearly independent 
subset of V can contain more than n vectors, this process must stop at a 
stage where k <n and {21,#2,..., 2%} is linearly independent but adjoining 
any other vector from W produces a linearly dependent set. Theorem 1.7 
(p. 39) implies that {x1,22,...,2~} generates W, and hence it is a basis for 
W. Therefore dim(W) =k <n. 

If dim(W) = n, then a basis for W is a linearly independent subset of V 
containing n vectors. But Corollary 2 of the replacement theorem implies 
that this basis for W is also a basis for V; so W = V. 


Example 17 
Let 


W = {(a1, G2, 43, 44,45) € F?: ay +a3 + a5 = 0, a2 = ag}. 
It is easily shown that W is a subspace of F° having 
{(—1, 0,1, 0,0), (—1,0,0,0, 1), (0, 1,0,1,0)} 
as a basis. Thus dim(W)=3. 


Example 18 


The set of diagonal n xn matrices is a subspace W of M,,x7,(F’) (see Example 3 
of Section 1.3). A basis for W is 


fe Ere jaa SEN, 


where E’) is the matrix in which the only nonzero entry is a 1 in the ith row 
and jth column. Thus dim(W)=n. 


Example 19 


We saw in Section 1.3 that the set of symmetric n x n matrices is a subspace 
W of Maxn(F). A basis for W is 


{A¥:1<i<j <n}, 
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where A” is the n x n matrix having 1 in the ith row and jth column, 1 in 
the jth row and ith column, and 0 elsewhere. It follows that 


dim(W) =n+(n-—1)+---+1=<=n(n4+1). 


Corollary. If W is a subspace of a finite-dimensional vector space V, then 
any basis for W can be extended to a basis for V. 


Proof. Let S be a basis for W. Because S is a linearly independent subset of 
V, Corollary 2 of the replacement theorem guarantees that S can be extended 
to a basis for V. | 


Example 20 
The set of all polynomials of the form 


1 1 2 
aygx'® + aygr'® +--+ + agx? + ao, 


where dig, @16,---,@2,a0 € F, is a subspace W of Pig(F’). A basis for W is 
{1,27,..., 218,218}, which is a subset of the standard basis for Pig(F). 


We can apply Theorem 1.11 to determine the subspaces of R? and R’°. 
Since R* has dimension 2, subspaces of R? can be of dimensions 0, 1, or 2 
only. The only subspaces of dimension 0 or 2 are {0} and R?, respectively. 
Any subspace of R? having dimension 1 consists of all scalar multiples of some 
nonzero vector in R? (Exercise 11 of Section 1.4). 

If a point of R? is identified in the natural way with a point in the Euclidean 
plane, then it is possible to describe the subspaces of R? geometrically: A 
subspace of R? having dimension 0 consists of the origin of the Euclidean 
plane, a subspace of R? with dimension 1 consists of a line through the origin, 
and a subspace of R? having dimension 2 is the entire Euclidean plane. 

Similarly, the subspaces of R* must have dimensions 0, 1, 2, or 3. Inter- 
preting these possibilities geometrically, we see that a subspace of dimension 
zero must be the origin of Euclidean 3-space, a subspace of dimension 1 is 
a line through the origin, a subspace of dimension 2 is a plane through the 
origin, and a subspace of dimension 3 is Euclidean 3-space itself. 


The Lagrange Interpolation Formula 


Corollary 2 of the replacement theorem can be applied to obtain a useful 
formula. Let co,c1,...,;Cn be distinct scalars in an infinite field F. The 
polynomials fo(x), fi(a),..., fn(v) defined by 


Ata (a — Co) +++ (@ — Gj-1)(@ — Ci41) +++ (@ — Cn) a] set 


(Geno (a — 4 le Gia) AGG) 
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are called the Lagrange polynomials (associated with co, ¢1,..., Cn). Note 
that each f;(7) is a polynomial of degree n and hence is in P,(F’). By re- 
garding f;(x) as a polynomial function f;: F — F, we see that 


ioe ae a 


This property of Lagrange polynomials can be used to show that @ = 
{fo, fi,---;fn} is a linearly independent subset of P,,(F’). Suppose that 


n 
y aif; = 0 forsome scalars ao, @1,...,@n, 
i=0 
where 0 denotes the zero function. Then 


n 


S- ai fi(c;) =0 forj=0,1,...,n. 
i=0 
But also 


n 


x ag f(x) => a; 


i=0 


by (10). Hence a; = 0 for 7 =0,1,... ,n; so ( is linearly independent. Since 
the dimension of P,,(F’) is n+1, it follows from Corollary 2 of the replacement 
theorem that ( is a basis for P,(F). 

Because (3 is a basis for P,,(F’), every polynomial function g in P,(F’) isa 
linear combination of polynomial functions of (3, say, 


=) bifi. 
i=0 
It follows that 


sO 


is the unique representation of g as a linear combination of elements of (. 
This representation is called the Lagrange interpolation formula. Notice 
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that the preceding argument shows that if bo,b;,...,b, are any n+1 scalars 
in F (not necessarily distinct), then the polynomial function 


Oo pa bi fi 
i=0 


is the unique polynomial in P,,(£’) such that g(c;) = b;. Thus we have found 
the unique polynomial of degree not exceeding n that has specified values 
b; at given points c; in its domain (j = 0,1,...,n). For example, let us 
construct the real polynomial g of degree at most 2 whose graph contains the 
points (1,8), (2,5), and (3, —4). (Thus, in the notation above, co = 1, c1 = 2, 
co = 3, bo = 8, by = 5, and bg = —4.) The Lagrange polynomials associated 
with co, C1, and cy are 


and 


Hence the desired polynomial is 


g(x) = 50 bi fi(w) = 8fo(x) + 5f1(a) — 4fo(a) 
1=0 


= 4(x? — 52 +6) — 5(x? — 42 + 3) — 2(a? — 32 + 2) 
= -327 +62 +5. 
An important consequence of the Lagrange interpolation formula is the fol- 


lowing result: If f € P,,(F’) and f(c;) = 0 for n+1 distinct scalars co, ¢1,..-, Cn 
in F’', then f is the zero function. 


EXERCISES 


1. Label the following statements as true or false. 


(a) The zero vector space has no basis. 

(b) Every vector space that is generated by a finite set has a basis. 
(c) Every vector space has a finite basis. 

(d) A vector space cannot have more than one basis. 
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(e) Ifa vector space has a finite basis, then the number of vectors in 
every basis is the same. 

(f) The dimension of P,,(F’) is n. 

(g) The dimension of Mnyn(F) is m+n. 

(h) Suppose that V is a finite-dimensional vector space, that $1 is a 
linearly independent subset of V, and that $2 is a subset of V that 
generates V. Then S; cannot contain more vectors than $4. 

(i) If S generates the vector space V, then every vector in V can be 

written as a linear combination of vectors in S in only one way. 

(j) Every subspace of a finite-dimensional space is finite-dimensional. 

(k) If V is a vector space having dimension n, then V has exactly one 

subspace with dimension 0 and exactly one subspace with dimen- 

sion n. 

(1) If V is a vector space having dimension n, and if S is a subset of 
V with n vectors, then S is linearly independent if and only if S 
spans V. 


Determine which of the following sets are bases for R°. 


(a) {(1,0,-1), (2,5,1), (0, —4,3)} 


(b) 125 = E),.(0; 3, =h);-(6 0,1} 

(c) {(1,2,-1), (1,0 ) pe xy 

(d) {(—1,3, 1), (2, -4, -3), (3,8, 2)} 
(e) {(1,-3, —2), (—3,1,3), (—2,-—10, —2)} 


Determine which of the following sets are bases for P2(R). 


(a) {-l-« + 2a" ,2+a—2x7,1 — 22 +427} 
(b) {1422+ 2? 34 x*,x2+ x7} 
(c) {1-22 On?, 2+ 32—27,1-2+ 627} 
(d) {-14 274 Ac? ,3 — 4a — 1027, -2 — 5a — 627} 
(e) {1+2x2 —27,4—2x7+27,-1+ 18x — 9x?} 


Do the polynomials x? — 2x? + 1,47? —2+3, and 3x—2 generate P3(R)? 
Justify your answer. 


Is {(1, 4, —6), (1, 5,8), (2, 1,1), (0,1,0)} a linearly independent subset of 
R3? Justify your answer. 


Give three different bases for F? and for Moyo(F). 
The vectors uy = (2,—3,1), wg = (1,4,-2), ug = (—8,12,—4), us = 


(1,37,—-17), and us = (—3, —5,8) generate R3. Find a subset of the set 
{u1, U2, U3, U4, Us} that is a basis for R°. 
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8. 


10. 


11. 


12. 


13. 


14. 


Let W denote the subspace of R® consisting of all the vectors having 
coordinates that sum to zero. The vectors 


uy = (2,—-3,4,-5,2), uz = (—6,9, —12, 15, 6), 
U3 = (3, —2,7, O51); U4 = (2, —8,2, 2,6), 
us = (—1,1,2,1,-3), ug = (0, —3, —18, 9,12), 
u7 = (1,0, —2,3,—2), ug = (2, —1,1,-9,7) 
generate W. Find a subset of the set {u1,ua,...,ug} that is a basis for 


W. 


The vectors uy = (1,1,1,1), w2 = (0,1,1,1), ug = (0,0,1,1), and 
ua = (0,0,0,1) form a basis for F*. Find the unique representation 
of an arbitrary vector (a1, a@2,@3,a@4) in F* as a linear combination of 
U1, Ug, U3, and U4. 


In each part, use the Lagrange interpolation formula to construct the 
polynomial of smallest degree whose graph contains the following points. 


(a) (-2 oy (—1,5), (1,3) 

(b) (—4,24), (1,9), (3,3) 

(c) (-2,3), (- ee (1,0), (8, -2) 
(- 3, —30), (—2,7), (0, 15), (1, 10) 


Let u and v be distinct vectors of a vector space V. Show that if {u,v} 
is a basis for V and a and b are nonzero scalars, then both {u + v, au} 
and {au, bv} are also bases for V. 


Let u, v, and w be distinct vectors of a vector space V. Show that if 
{u,v, wh} is a basis for V, then {u+vu+w,v+uw, w} is also a basis for V. 


The set of solutions to the system of linear equations 


v1 — 2%. +23=0 
2x21 — 3%2 + 73 = 0 


is a subspace of R®. Find a basis for this subspace. 
Find bases for the following subspaces of F°: 
W, = {(a1, a2, a3, @4,45) € F°: ay — a3 — ag = OF 
and 
Woe = {(a1, d2, a3, G4, a5) € F°: ag = a3 = a4 and a, +a5 = O}. 


What are the dimensions of W, and W2? 
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15. 


16. 


17. 


18. 


19. 


20 


21. 


22. 


23. 


24. 


25. 
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The set of all n x n matrices having trace equal to zero is a subspace W 
of Mnxn(F’) (see Example 4 of Section 1.3). Find a basis for W. What 
is the dimension of W? 


The set of all upper triangular n x n matrices is a subspace W of 
Mnxn(£’) (see Exercise 12 of Section 1.3). Find a basis for W. What is 
the dimension of W? 


The set of all skew-symmetric n x n matrices is a subspace W of 
Mnxn(£’) (see Exercise 28 of Section 1.3). Find a basis for W. What is 
the dimension of W? 


Find a basis for the vector space in Example 5 of Section 1.2. Justify 
your answer. 


Complete the proof of Theorem 1.8. 


.' Let V be a vector space having dimension n, and let S be a subset of V 


that generates V. 


(a) Prove that there is a subset of S' that is a basis for V. (Be careful 
not to assume that S' is finite.) 
(b) Prove that S contains at least n vectors. 


Prove that a vector space is infinite-dimensional if and only if it contains 
an infinite linearly independent subset. 


Let W, and Wz be subspaces of a finite-dimensional vector space V. 
Determine necessary and sufficient conditions on W, and Wp) so that 


Let v1,v2,---,Uk,v be vectors in a vector space V, and define W, 

span({v1, v2,..-,Uz}), and Wz = span({v1, v2,..., Uz, v}). 

(a) Find necessary and sufficient conditions on v such that dim(W,) = 

(b) State and prove a relationship involving dim(W ) and dim(W2) in 
the case that dim(W 1) 4 dim(W2). 


Let f(x) be a polynomial of degree n in P,,(R). Prove that for any 
g(x) € P,,(R) there exist scalars co, c1,..-, Cn such that 


g(2) = cof (x) + erf!(x) + cof" (x) +++ +enf™ (a), 
where f‘")(a) denotes the nth derivative of f(x). 


Let V, W, and Z be as in Exercise 21 of Section 1.2. If V and W are 
vector spaces over F of dimensions m and n, determine the dimension 
of Z. 


Sec 


26. 


27. 


28. 
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For a fixed a € R, determine the dimension of the subspace of P,,(R) 
defined by {f € P,(R): f(a) = 0}. 


Let W; and W2 be the subspaces of P(F’) defined in Exercise 25 in 
Section 1.3. Determine the dimensions of the subspaces W; M P,,(F’) 
and W2M P,(F). 


Let V be a finite-dimensional vector space over C’ with dimension n. 
Prove that if V is now regarded as a vector space over R, then dim V = 
2n. (See Examples 11 and 12.) 


Exercises 29-34 require knowledge of the sum and direct sum of subspaces, 
as defined in the exercises of Section 1.3. 


29. 


30. 


31. 


32. 


(a) Prove that if W, and Wo are finite-dimensional subspaces of a 
vector space V, then the subspace W, + W, is finite-dimensional, 
and dim(W, + W2) = dim(Wj) + dim(W2) — dim(W, NW). Hint: 
Start with a basis {u,,u2,...,uz} for W, MWe and extend this 
set to a basis {u1,U2,...,Uk,V1,U2,---Um} for W, and to a basis 
{U1,U2,... , Uk, W1, W2,-.. Wp} for We. 

(b) Let W, and W2 be finite-dimensional subspaces of a vector space 
V, and let V = W; + Wo. Deduce that V is the direct sum of W, 
and W, if and only if dim(V) = dim(W;) + dim(W2). 


Let 


and 


0a 
we=4(f ) eViabe Ft. 


Prove that W, and W2 are subspaces of V, and find the dimensions of 
Wi, Wo, Wi + Wo, and Wi N Wo. 


Let W, and W2 be subspaces of a vector space V having dimensions m 
and n, respectively, where m > n. 


(a) Prove that dim(W 1M W2) < n. 
(b) Prove that dim(W, + W2) <m-+n. 


(a) Find an example of subspaces W; and W2 of R® with dimensions 
mand n, where m >n > 0, such that dim(W,M W2) = n. 

(b) Find an example of subspaces W; and Wy, of R® with dimensions 
m and n, where m > n > 0, such that dim(W, + W2) = m+n. 
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(c) Find an example of subspaces W, and W2 of R® with dimensions 
m and n, where m > n, such that both dim(W;MW2) < n and 
dim(W, + W2) < m+n. 


33. (a) Let W; and W2 be subspaces of a vector space V such that V = 
W,@Ws. If 2, and G2 are bases for W; and Wg, respectively, show 
that 6,9 G2 = @ and ( U (2 is a basis for V. 
(b) Conversely, let G, and (2 be disjoint bases for subspaces W, and 
W2, respectively, of a vector space V. Prove that if @, U G2 is a 
basis for V, then V = W; @ Wo. 


34. (a) Prove that if W, is any subspace of a finite-dimensional vector 
space V, then there exists a subspace W2 of V such that V = 

WO Wo. 
(b) Let V = R? and W, = {(a1,0): a1 € R}. Give examples of two 
different subspaces W2 and W5 such that V = W; @ W2 and V = 

W, 0 W5. 


The following exercise requires familiarity with Exercise 31 of Section 1.3. 


35. Let W bea subspace of a finite-dimensional vector space V, and consider 
the basis {u1,u2,...,ug} for W. Let {u1,u2,...,Uk,Uk+1,---,Un} be 
an extension of this basis to a basis for V. 


(a) Prove that {uxii + W, ute + W,...,Un+W} is a basis for V/W. 
(b) Derive a formula relating dim(V), dim(W), and dim(V/W). 


1.7* MAXIMAL LINEARLY INDEPENDENT SUBSETS 


In this section, several significant results from Section 1.6 are extended to 
infinite-dimensional vector spaces. Our principal goal here is to prove that 
every vector space has a basis. This result is important in the study of 
infinite-dimensional vector spaces because it is often difficult to construct an 
explicit basis for such a space. Consider, for example, the vector space of 
real numbers over the field of rational numbers. There is no obvious way to 
construct a basis for this space, and yet it follows from the results of this 
section that such a basis does exist. 

The difficulty that arises in extending the theorems of the preceding sec- 
tion to infinite-dimensional vector spaces is that the principle of mathematical 
induction, which played a crucial role in many of the proofs of Section 1.6, 
is no longer adequate. Instead, a more general result called the maximal 
principle is needed. Before stating this principle, we need to introduce some 
terminology. 


Definition. Let F be a family of sets. A member M of F is called 
maximal (with respect to set inclusion) if M is contained in no member of 
F other than M itself. 
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Example 1 


Let F be the family of all subsets of a nonempty set S. (This family F is 
called the power set of S.) The set S' is easily seen to be a maximal element 
of F. ¢ 


Example 2 


Let S and T be disjoint nonempty sets, and let F be the union of their power 
sets. Then S and T are both maximal elements of F. 


Example 3 


Let F be the family of all finite subsets of an infinite set S. Then F has no 
maximal element. For if M is any member of F and s is any element of S 
that is not in M, then MU {s} is a member of F that contains M as a proper 
subset. 


Definition. A collection of sets C is called a chain (or nest or tower) 
if for each pair of sets A and B inC, either AC Bor BCA. 


Example 4 


For each positive integer n let A, = {1,2,...,n}. Then the collection of 
sets C = {A,: n = 1,2,3,...} is a chain. In fact, A,, C A, if and only if 
man. ¢ 


With this terminology we can now state the maximal principle. 


Maximal Principle.* Let F be a family of sets. If, for each chainC C Ff, 
there exists a member of F that contains each member of C, then F contains 
a maximal member. 


Because the maximal principle guarantees the existence of maximal el- 
ements in a family of sets satisfying the hypothesis above, it is useful to 
reformulate the definition of a basis in terms of a maximal property. In The- 
orem 1.12, we show that this is possible; in fact, the concept defined next is 
equivalent to a basis. 


Definition. Let S be a subset of a vector space V. A maximal linearly 
independent subset of S is a subset B of S satisfying both of the following 
conditions. 

(a) B is linearly independent. 
(b) The only linearly independent subset of S that contains B is B itself. 


4The Maximal Principle is logically equivalent to the Axiom of Choice, which 
is an assumption in most axiomatic developments of set theory. For a treatment 
of set theory using the Maximal Principle, see John L. Kelley, General Topology, 
Graduate Texts in Mathematics Series, Vol. 27, Springer-Verlag, 1991. 
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Example 5 
Example 2 of Section 1.4 shows that 


{2° — 22? — 5a — 3,32° — 5a? — 4a — 9} 


is a maximal linearly independent subset of 


S = {22° — 2x? + 122 —6,2° — 2x? — 52 — 3, 32° — 52? — 4x — 9} 


in P2(R). In this case, however, any subset of S consisting of two polynomials 
is easily shown to be a maximal linearly independent subset of S. Thus 
maximal linearly independent subsets of a set need not be unique. 


A basis @ for a vector space V is a maximal linearly independent subset 
of V, because 


1. @ is linearly independent by definition. 
2. fue V and v ¢ GB, then 3 U {v} is linearly dependent by Theorem 1.7 
(p. 39) because span() = V. 


Our next result shows that the converse of this statement is also true. 


Theorem 1.12. Let V be a vector space and S a subset that generates 
V. If @ is a maximal linearly independent subset of S, then ( is a basis for V. 


Proof. Let @ be a maximal linearly independent subset of S. Because 3 
is linearly independent, it suffices to prove that @ generates V. We claim 
that S C span(), for otherwise there exists a v € S such that v ¢ span({). 
Since Theorem 1.7 (p. 39) implies that GU {v} is linearly independent, we 
have contradicted the maximality of G. Therefore S C span(). Because 
span(S') = V, it follows from Theorem 1.5 (p. 30) that span(3) = V. | 


Thus a subset of a vector space is a basis if and only if it is a maximal 
linearly independent subset of the vector space. Therefore we can accomplish 
our goal of proving that every vector space has a basis by showing that every 
vector space contains a maximal linearly independent subset. This result 
follows immediately from the next theorem. 


Theorem 1.13. Let S be a linearly independent subset of a vector space 
V. There exists a maximal linearly independent subset of V that contains S. 


Proof. Let F denote the family of all linearly independent subsets of V 
that contain S. In order to show that F contains a maximal element, we must 
show that if C is achain in F, then there exists a member U of F that contains 
each member of C. We claim that U, the union of the members of C, is the 
desired set. Clearly U contains each member of C, and so it suffices to prove 
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that U € F (ie., that U is a linearly independent subset of V that contains S). 
Because each member of C is a subset of V containing S, we have S CU CV. 
Thus we need only prove that U is linearly independent. Let ui, u2,...,uUn 
be in U and aj, a2,...,@, be scalars such that a,u, + aqug +---+4nUn = 0. 
Because u; € U for i = 1,2,...,n, there exists a set A; in C such that u; € Aj. 
But since C is a chain, one of these sets, say A,, contains all the others. Thus 
u; © A, for 7 = 1,2,...,n. However, A, is a linearly independent set; so 
au, + agu2 +---+anUn = O implies that a, = ag =--- =a, = 0. It follows 
that U is linearly independent. 

The maximal principle implies that F has a maximal element. This el- 
ement is easily seen to be a maximal linearly independent subset of V that 
contains S. | 


Corollary. Every vector space has a basis. 


It can be shown, analogously to Corollary 1 of the replacement theorem 
(p. 46), that every basis for an infinite-dimensional vector space has the same 
cardinality. (Sets have the same cardinality if there is a one-to-one and onto 
mapping between them.) (See, for example, N. Jacobson, Lectures in Ab- 
stract Algebra, vol. 2, Linear Algebra, D. Van Nostrand Company, New 
York, 1953, p. 240.) 

Exercises 4-7 extend other results from Section 1.6 to infinite-dimensional 
vector spaces. 


EXERCISES 


1. Label the following statements as true or false. 


(a) Every family of sets contains a maximal element. 

(b) Every chain contains a maximal element. 

(c) If a family of sets has a maximal element, then that maximal 
element is unique. 

(d) Ifa chain of sets has a maximal element, then that maximal ele- 
ment is unique. 

(e) A basis for a vector space is a maximal linearly independent subset 
of that vector space. 

(f) A maximal linearly independent subset of a vector space is a basis 
for that vector space. 


2. Show that the set of convergent sequences is an infinite-dimensional 
subspace of the vector space of all sequences of real numbers. (See 
Exercise 21 in Section 1.3.) 


3. Let V be the set of real numbers regarded as a vector space over the 
field of rational numbers. Prove that V is infinite-dimensional. Hint: 
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Use the fact that a is transcendental, that is, 7 is not a zero of any 
polynomial with rational coefficients. 

4. Let W be a subspace of a (not necessarily finite-dimensional) vector 
space V. Prove that any basis for W is a subset of a basis for V. 

5. Prove the following infinite-dimensional version of Theorem 1.8 (p. 43): 
Let 6 be a subset of an infinite-dimensional vector space V. Then ( is a 
basis for V if and only if for each nonzero vector v in V, there exist unique 
vectors U1, U2,...,Un in G and unique nonzero scalars c1,C2,..., Cn Such 
that v = cyuy + cotg + +++ + CnUn. 

6. Prove the following generalization of Theorem 1.9 (p. 44): Let S; and 
Sy be subsets of a vector space V such that S$; C So. If S$; is linearly 
independent and $2 generates V, then there exists a basis @ for V such 
that S$; C BG C Sp. Hint: Apply the maximal principle to the family of 
all linearly independent subsets of Sz that contain S;, and proceed as 
in the proof of Theorem 1.13. 

7. Prove the following generalization of the replacement theorem. Let ( 
be a basis for a vector space V, and let S' be a linearly independent 
subset of V. There exists a subset 5S; of @ such that SU Sj is a basis 
for V. 
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Linear Transformations 
and Matrices 


2.1 Linear Transformations, Null spaces, and Ranges 

2.2 The Matrix Representation of a Linear Transformation 

2.3. Composition of Linear Transformations and Matrix Multiplication 

2.4 — Invertibility and Isomorphisms 

2.5 The Change of Coordinate Matrix 

2.6* Dual Spaces 

2.7* Homogeneous Linear Differential Equations with Constant Coefficients 


de Chapter 1, we developed the theory of abstract vector spaces in consid- 
erable detail. It is now natural to consider those functions defined on vector 
spaces that in some sense “preserve” the structure. These special functions 
are called linear transformations, and they abound in both pure and applied 
mathematics. In calculus, the operations of differentiation and integration 
provide us with two of the most important examples of linear transforma- 
tions (see Examples 6 and 7 of Section 2.1). These two examples allow us 
to reformulate many of the problems in differential and integral equations in 
terms of linear transformations on particular vector spaces (see Sections 2.7 
and 5.2). 

In geometry, rotations, reflections, and projections (see Examples 2, 3, 
and 4 of Section 2.1) provide us with another class of linear transformations. 
Later we use these transformations to study rigid motions in R” (Section 
6.10). 

In the remaining chapters, we see further examples of linear transforma- 
tions occurring in both the physical and the social sciences. Throughout this 
chapter, we assume that all vector spaces are over a common field F’. 


2.1. LINEAR TRANSFORMATIONS, NULL SPACES, AND RANGES 
In this section, we consider a number of examples of linear transformations. 


Many of these transformations are studied in more detail in later sections. 
Recall that a function T with domain V and codomain W is denoted by 
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T: V— W. (See Appendix B.) 


Definition. Let V and W be vector spaces (over F'). We call a function 
T: V— W a linear transformation from V to W if, for all x,y € V and 
c € F, we have 
(a) T(a2+y) = T(x) + T(y) and 
(b) T(cax) = cT(x). 


If the underlying field F' is the field of rational numbers, then (a) implies 
(b) (see Exercise 37), but, in general (a) and (b) are logically independent. 
See Exercises 38 and 39. 

We often simply call T linear. The reader should verify the following 
properties of a function T: V — W. (See Exercise 7.) 


1. If T is linear, then T(0) = 0. 
2. T is linear if and only if T(ca + y) = cT(a) + T(y) for all x,y € V and 


ce F. 

3. If T is linear, then T(x — y) = T(x) — T(y) for all zy € V. 

4. T is linear if and only if, for 71, 2%2,...,%, € V and aj,q2,...,@n € F, 
we have 


n n 
i=l i=l 
We generally use property 2 to prove that a given transformation is linear. 


Example 1 
Define 


T: R? > R? by T(a1, a2) = (2a) + a2, a3). 


To show that T is linear, let c € R and x,y € R’, where x = (bi,b2) and 
y= (di, dg). Since 


cx + Y= (cb, + di, cb2 + dg), 
we have 


T(ca + y) = (2(cby t d1) t cbo t dz, cby t dy). 


Also 


cT(x) + T(y) = c(2b1 + be, b1) + (21 + do, dr) 
(2cb1 cbo t 2d, t do, chy t d;) 
= (2(cbi + d;) + cb + da, cb, + dy). 


So T is linear. 
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To (a1, a2 
rae (a1, a2) (a1, a2) 
(a1, a2) 

a 1 
T(a1, a2) = 

T(a1, a2) = (a1, 0) 

(a1, —az) 
(a) Rotation (b) Reflection (c) Projection 
Figure 2.1 


As we will see in Chapter 6, the applications of linear algebra to geometry 
are wide and varied. The main reason for this is that most of the important 
geometrical transformations are linear. Three particular transformations that 
we now consider are rotation, reflection, and projection. We leave the proofs 
of linearity to the reader. 

Example 2 


For any angle 6, define Tg: R? — R? by the rule: Tg(ai,a2) is the vector 
obtained by rotating (a),a2) counterclockwise by @ if (a,,a2) # (0,0), and 
T9(0,0) = (0,0). Then Tg: R? — R? is a linear transformation that is called 
the rotation by @. 


We determine an explicit formula for Tg. Fix a nonzero vector (a1, a2) € 
R?. Let a be the angle that (a1,a2) makes with the positive z-axis (see 


Figure 2.1(a)), and let r = \/a? +a. Then a; = rcosa and a2 = rsina. 
Also, Tg(a1,@2) has length r and makes an angle a + @ with the positive 
x-axis. It follows that 
To(a1, a2) = (rcos(a + 9), rsin(a + 8)) 
= (rcosacos@ —rsinasin6,rcos asin @ +r sin acos 6) 
= (a; cos 6 — ag sin 6, a; sin @ + ag cos 8). 
Finally, observe that this same formula is valid for (a1, a2) = (0,0). 


It is now easy to show, as in Example 1, that Tg is linear. 


Example 3 

Define T: R? — R? by T(ai,a2) = (a1,—a@2). T is called the reflection 
about the x -axis. (See Figure 2.1(b).) 

Example 4 


Define T: R? — R? by T(ai, az) = (a1,0). T is called the projection on the 
x-axis. (See Figure 2.1(c).) 
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We now look at some additional examples of linear transformations. 


Example 5 


Define T: Minxn(F) > Mnxm(F) by T(A) = A‘, where A? is the transpose 
of A, defined in Section 1.3. Then T is a linear transformation by Exercise 3 
of Section 1.3. 


Example 6 


Define T: P,(R) > Pn_-i(R) by T(f(x)) = f’(x), where f(a) denotes the 
derivative of f(a). To show that T is linear, let g(x), h(x) € P,(R) andaé R. 
Now 


T(ag(x) + h(x)) = (ag(x) + h(z)) = ag'(x) + h'(x) = aT(g(x)) + T(A(z)). 
So by property 2 above, T is linear. 


Example 7 


Let V = C(R), the vector space of continuous real-valued functions on R. Let 
a,b€ R,a< b. Define T: V— R by 


b 
T= [seat 


for all f € V. Then T is a linear transformation because the definite integral 
of a linear combination of functions is the same as the linear combination of 
the definite integrals of the functions. 


Two very important examples of linear transformations that appear fre- 
quently in the remainder of the book, and therefore deserve their own nota- 
tion, are the identity and zero transformations. 

For vector spaces V and W (over F’), we define the identity transfor- 
mation ly: V > V by ly(a) = @ for all x € V and the zero transformation 
To: V > W by To(x) = @ for all « € V. It is clear that both of these 
transformations are linear. We often write | instead of ly. 

We now turn our attention to two very important sets associated with 
linear transformations: the range and null space. The determination of these 
sets allows us to examine more closely the intrinsic properties of a linear 
transformation. 


Definitions. Let V and W be vector spaces, and let T: V — W be linear. 
We define the null space (or kernel) N(T) of T to be the set of all vectors 
x in V such that T(x) = 0; that is, N(T) = {@ € V: T(x) = 0}. 

We define the range (or image) R(T) of T to be the subset of W con- 
sisting of all images (under T) of vectors in V; that is, R(T) = {T(x): # € V}. 
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Example 8 


Let V and W be vector spaces, and let |: V — V and Ty: V — W be the 
identity and zero transformations, respectively. Then N(l) = {0}, R(l) = V, 
N(To) =V, and R(To) = {0}. 


Example 9 
Let T: R? — R? be the linear transformation defined by 


T (a1, @2, 43) = (a1 — a2, 2a3). 
It is left as an exercise to verify that 


N(T) = {(a,a,0):a€ R} and R(T)=R”%. 


In Examples 8 and 9, we see that the range and null space of each of the 
linear transformations is a subspace. The next result shows that this is true 
in general. 


Theorem 2.1. Let V and W be vector spaces and T: V — W be linear. 
Then N(T) and R(T) are subspaces of V and W, respectively. 


Proof. To clarify the notation, we use the symbols 0y and Ow to denote 
the zero vectors of V and W, respectively. 

Since T(0v) = Ow, we have that 0v € N(T). Let z,y € N(T) and ce F. 
Then T(a+y) = T(x) + T(y) = Ow+ Ow = Ow, and T(cx) = cT(x) = cOw 
Ow. Hence « + y € N(T) and cx € N(T), so that N(T) is a subspace of V. 

Because T(0v) = Ow, we have that Ow € R(T). Now let x,y € R(T) and 
c € F. Then there exist v and w in V such that T(v) = x and T(w) = y. So 
T(v+w) = T(v)+T(w) = #+y, and T(cv) = cT(v) = cv. Thus ++ y € R(T) 
and cz € R(T), so R(T) is a subspace of W. 


The next theorem provides a method for finding a spanning set for the 
range of a linear transformation. With this accomplished, a basis for the 
range is easy to discover using the technique of Example 6 of Section 1.6. 


Theorem 2.2. Let V and W be vector spaces, and let T: V — W be 
linear. If 3 = {v1,v2,...,Un} is a basis for V, then 


R(T) = span(T(8)) = span({T(v1), T(v2),.-.., T(vn) }). 


Proof. Clearly T(v;) € R(T) for each 7. Because R(T) is a subspace, 
R(T) contains span({T(v1), T(v2),..-, T(vn)}) = span(T(@)) by Theorem 1.5 
(p. 30). 
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Now suppose that w € R(T). Then w = T(v) for some v € V. Because ( 
is a basis for V, we have 


n 
v=) a;v; for some a4, 42,...,4n € F. 
i=l 


Since T is linear, it follows that 
w =T(v) = 5° a;T(vi) € span(T(A)). 
i=1 


So R(T) is contained in span(T(()). 
It should be noted that Theorem 2.2 is true if @ is infinite, that is, R(T) = 
span({T(v): v € G}). (See Exercise 33.) 
The next example illustrates the usefulness of Theorem 2.2. 
Example 10 
Define the linear transformation T: P2(R) — Mox2(R) by 


f()- FQ) 0 
Ce eae ye 


Since 3 = {1,z,27} is a basis for P2(R), we have 


R(T) = span(T()) = span({T(1), T(x), T(2?)}) 
svt 1) Co 0) Co os) 
=wvon({(0 1) 0) f): 


Thus we have found a basis for R(T), and so dim(R(T))=2. 


As in Chapter 1, we measure the “size” of a subspace by its dimension. 
The null space and range are so important that we attach special names to 
their respective dimensions. 


Definitions. Let V and W be vector spaces, and let T: V — W be 
linear. If N(T) and R(T) are finite-dimensional, then we define the nullity 
of T, denoted nullity(T), and the rank of T, denoted rank(T), to be the 
dimensions of N(T) and R(T), respectively. 


Reflecting on the action of a linear transformation, we see intuitively that 
the larger the nullity, the smaller the rank. In other words, the more vectors 
that are carried into 0, the smaller the range. The same heuristic reasoning 
tells us that the larger the rank, the smaller the nullity. This balance between 
rank and nullity is made precise in the next theorem, appropriately called the 
dimension theorem. 
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Theorem 2.3 (Dimension Theorem). Let V and W be vector spaces, 
and let T: V — W be linear. If V is finite-dimensional, then 


nullity(T) + rank(T) = dim(V). 


Proof. Suppose that dim(V) = n, dim(N(T)) = &, and {vj, v2,..., ve} is 
a basis for N(T). By the corollary to Theorem 1.11 (p. 51), we may extend 
{v1,v2,..., UR} to a basis B = {v1,v2,...,Un} for V. We claim that S = 
{T(vn41), T(ve+2),---; T(Un)} is a basis for R(T). 

First we prove that S generates R(T). Using Theorem 2.2 and the fact 
that T(v;) = 0 for 1 <i<k, we have 


R(T) = span({T(v1), T(v2),.--, T(un) } 
= span({T(vg41), T(ve+a),---, T(vn)} = span(S). 


Now we prove that S$ is linearly independent. Suppose that 


S* dT (vi) = 0 for bya, dep2,---5bn € F. 
i=k+1 


Using the fact that T is linear, we have 


+(S on) <a 


i=k+1 
So 
n 
S~ divi € N(T). 
i=k+1 
Hence there exist cy, ¢2,...,ck € F such that 
n k k n 
S- bv; = ee or So (-ei)vi + ss b;v; aby 
i=k+1 i=1 i=1 i=k+1 


Since £ is a basis for V, we have 6; = 0 for all 7. Hence S$ is linearly indepen- 
dent. Notice that this argument also shows that T(vg+41), T(ve42),---; T(vn) 
are distinct; therefore rank(T) =n — k. 


If we apply the dimension theorem to the linear transformation T in Ex- 
ample 9, we have that nullity(T) + 2 = 3, so nullity(T) = 1. 

The reader should review the concepts of “one-to-one” and “onto” pre- 
sented in Appendix B. Interestingly, for a linear transformation, both of these 
concepts are intimately connected to the rank and nullity of the transforma- 
tion. This is demonstrated in the next two theorems. 
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Theorem 2.4. Let V and W be vector spaces, and let T: V — W be 
linear. Then T is one-to-one if and only if N(T) = {0}. 


Proof. Suppose that T is one-to-one and x € N(T). Then T(z) = 0 = 
T(0). Since T is one-to-one, we have x = 0. Hence N(T) = {0}. 

Now assume that N(T) = {0}, and suppose that T(x) = T(y). Then 
0 = T(x) — T(y) = T(x — y) by property 3 on page 65. Therefore x — y € 
N(T) = {0}. So x—y= 0, or x = y. This means that T is one-to-one. | 


The reader should observe that Theorem 2.4 allows us to conclude that 
the transformation defined in Example 9 is not one-to-one. 

Surprisingly, the conditions of one-to-one and onto are equivalent in an 
important special case. 


Theorem 2.5. Let V and W be vector spaces of equal (finite) dimension, 
and let T: V — W be linear. Then the following are equivalent. 
(a) T is one-to-one. 
(b) T is onto. 
(c) rank(T) = dim(V). 


Proof. From the dimension theorem, we have 
nullity(T) + rank(T) = dim(V). 


Now, with the use of Theorem 2.4, we have that T is one-to-one if and only if 
N(T) = {0}, if and only if nullity(T) = 0, if and only if rank(T) = dim(V), if 
and only if rank(T) = dim(W), and if and only if dim(R(T)) = dim(W). By 
Theorem 1.11 (p. 50), this equality is equivalent to R(T) = W, the definition 
of T being onto. 


We note that if V is not finite-dimensional and T: V — V is linear, then 
it does not follow that one-to-one and onto are equivalent. (See Exercises 15, 
16, and 21.) 

The linearity of T in Theorems 2.4 and 2.5 is essential, for it is easy to 
construct examples of functions from R into R that are not one-to-one, but 
are onto, and vice versa. 

The next two examples make use of the preceding theorems in determining 
whether a given linear transformation is one-to-one or onto. 


Example 11 
Let T: P2(R) > P3(R) be the linear transformation defined by 


T(f(a)) = 2F"(a) + | ” sf (t) at. 
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Now 
R(T) = span({T(1), T(x), T(x?)}) = span({3a, 2+ 52, Ax + x°}). 


Since {3x,2 + 3x, 4a + x3} is linearly independent, rank(T) = 3. Since 
dim(P3()) = 4, T is not onto. From the dimension theorem, nullity(T) + 
3 = 3. So nullity(T) = 0, and therefore, N(T) = {0}. We conclude from 
Theorem 2.4 that T is one-to-one. 


Example 12 
Let T: F? — F? be the linear transformation defined by 


T(@1,@2) = (a1 + a2, a1). 


It is easy to see that N(T) = {0}; so T is one-to-one. Hence Theorem 2.5 
tells us that T must be onto. 


In Exercise 14, it is stated that if T is linear and one-to-one, then a 
subset S is linearly independent if and only if T(S) is linearly independent. 
Example 13 illustrates the use of this result. 


Example 13 
Let T: P2(R) — R® be the linear transformation defined by 


T(ao + aia + az2") = (do, @1, G2). 


Clearly T is linear and one-to-one. Let S = {2 —2+ 327,24 27,1 — 227}. 
Then S is linearly independent in P2(R) because 


T(S) = {(2, -1, 3), (0,1, 1), (1,0, —2)} 
is linearly independent in R?. 


In Example 13, we transferred a property from the vector space of polyno- 
mials to a property in the vector space of 3-tuples. This technique is exploited 
more fully later. 

One of the most important properties of a linear transformation is that it is 
completely determined by its action on a basis. This result, which follows from 
the next theorem and corollary, is used frequently throughout the book. 


Theorem 2.6. Let V and W be vector spaces over F', and suppose that 
{v1, v2,...,Un} is a basis for V. For w1,we,...,Wn in W, there exists exactly 
one linear transformation T: V > W such that T(v;) = w; fori = 1,2,...,n. 
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Proof. Let x € V. Then 


n 
t= ) aYUi, 
i=1 
where a 42,...,@, are unique scalars. Define 
n 
T:V—W by T(a) = ) AiW;. 
i=l 


(a) T is linear: Suppose that u,v € V and d € F’. Then we may write 


n 


n 
u=)5 bjv; and v=) CiV; 
i=1 


i=1 
for some scalars 61, b2,...,bn,C1,C2,---,;€n. Thus 


n 


du+uv= So (abi + ¢;4)U;. 


i=l 


So 


n 


i=1 i=l 


i=l 
(b) Clearly 
T(u;) =u; fori =1,2,...,n. 


(c) T is unique: Suppose that U: V > W is linear and U(v;) = w; for 
i= 1,2,...,n. Then for x € V with 


n 
r= y A4V;, 
i=1 


we have 
U(x) = S © a, U(v4) = So aw; = T(z). 
i=1 i=1 


Hence U=T. | 


Corollary. Let V and W be vector spaces, and suppose that V has a 
finite basis {v1,V2,...,Un,}. If U,T: V > W are linear and U(v;) = T(v,) for 
i=1,2,...,n, then U=T. 
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Example 14 
Let T: R? — R? be the linear transformation defined by 


T(a1, @2) = (2a2 — a1, 3a1), 


and suppose that U: R? — R? is linear. If we know that U(1,2) = (3,3) and 
U(1,1) = (1,3), then U = T. This follows from the corollary and from the 
fact that {(1,2),(1,1)} is a basis for R?. 


EXERCISES 


1. Label the following statements as true or false. In each part, V and W 
are finite-dimensional vector spaces (over F’), and T is a function from 
V to W. 


(a) If T is linear, then T preserves sums and scalar products. 

(b) If T(a#+y) = T(x) + T(y), then T is linear. 

(c) T is one-to-one if and only if the only vector x such that T(x) = 0 

isx=0. 

(d) If T is linear, then T(0v) = Ow. 

(e) If T is linear, then nullity(T) + rank(T) = dim(W). 

(f) If T is linear, then T carries linearly independent subsets of V onto 

linearly independent subsets of W. 

(g) If T,U: V > W are both linear and agree on a basis for V, then 
T=U. 

(h) Given 21,272 € V and y1, yo € W, there exists a linear transforma- 
tion T: V > W such that T(21) = yy and T(a2) = yp. 


For Exercises 2 through 6, prove that T is a linear transformation, and find 
bases for both N(T) and R(T). Then compute the nullity and rank of T, and 
verify the dimension theorem. Finally, use the appropriate theorems in this 
section to determine whether T is one-to-one or onto. 


2. T: R3 — R? defined by T(a1, a2, a3) = (a1 — a2, 2a3). 
3. T: R? — R? defined by T(a1, a2) = (a1 + a2, 0, 2a1 — a2). 


4. T: Mox3(F) => Mox2(F) defined by 


zat a2 413) _ 2a11— 412 413 + 2a12 
G21 422 23 0 0 


5. T: Po(R) > P3(R) defined by T(f(x)) = xf(x) + f’(z). 
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14. 


15. 
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T: Maxn(£) — F defined by T(A) = tr(A). Recall (Example 4, Sec- 
tion 1.3) that 


Prove properties 1, 2, 3, and 4 on page 65. 
Prove that the transformations in Examples 2 and 3 are linear. 


In this exercise, T: R? — R? is a function. For each of the following 
parts, state why T is not linear. 


(a) T(a1, a2) = (1, az) 


(b) T(a1,a2) = (a1, a7) 

(c) T(a1, a2) = (sina,,0) 
(d) T(a1,a2) = (ai, a2) 
(e) T(a1,a2) = (a1 +1, a2) 


Suppose that T: R? — R? is linear, T(1,0) = (1,4), and T(1,1) = (2,5). 
What is T(2,3)? Is T one-to-one? 


Prove that there exists a linear transformation T: R? — R® such that 
T(1,1) = (1,0,2) and T(2,3) = (1,—1,4). What is T(8, 11)? 


Is there a linear transformation T: R® > R? such that T(1,0,3) = (1,1) 
and T(—2, 0, —6) = (2,1)? 


Let V and W be vector spaces, let T: V — W be linear, and let 
{w1,We,..., Wz} be a linearly independent subset of R(T). Prove that 
if S = {v,,v2,... ,ug} is chosen so that T(v;) = w; for i = 1,2,... ,k, 
then S' is linearly independent. 


Let V and W be vector spaces and T: V — W be linear. 


(a) Prove that T is one-to-one if and only if T carries linearly inde- 
pendent subsets of V onto linearly independent subsets of W. 

(b) Suppose that T is one-to-one and that S is a subset of V. Prove 
that S' is linearly independent if and only if T(S) is linearly inde- 


pendent. 

(c) Suppose @ = {v1,v2,...,Un} is a basis for V and T is one-to-one 
and onto. Prove that T(3) = {T(v1), T(w2),..., T(un)} is a basis 
for W. 


Recall the definition of P(R) on page 10. Define 


T: P(R) + P(R) by T(f(x)) = in f(t) dt 


Prove that T linear and one-to-one, but not onto. 
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Let T: P(R) — P(R) be defined by T(f(x)) = f’(x). Recall that T is 
linear. Prove that T is onto, but not one-to-one. 


Let V and W be finite-dimensional vector spaces and T: V — W be 
linear. 


(a) Prove that if dim(V) < dim(W), then T cannot be onto. 
(b) Prove that if dim(V) > dim(W), then T cannot be one-to-one. 


Give an example of a linear transformation T: R? — R? such that 
N(T) = R(T). 


Give an example of distinct linear transformations T and U such that 
N(T) = N(U) and R(T) = R(U). 


Let V and W be vector spaces with subspaces V; and Wy, respectively. 
If T: V — W is linear, prove that T(V1) is a subspace of W and that 
{x €V: T(x) € Wj} is a subspace of V. 


Let V be the vector space of sequences described in Example 5 of Sec- 
tion 1.2. Define the functions T,U: V — V by 


T(a@1,@2,...) = (@2,a3,...) and U(az,a2,...) = (0, a4, @,...). 


T and U are called the left shift and right shift operators on V, 
respectively. 
(a) Prove that T and U are linear. 


(b) Prove that T is onto, but not one-to-one. 
(c) Prove that U is one-to-one, but not onto. 


Let T: R® — R be linear. Show that there exist scalars a, b, and c such 
that T(x,y,z) = ax + by + cz for all (x,y,z) € R’. Can you generalize 
this result for T: F’ — F? State and prove an analogous result for 
T: F? > F™, 


Let T: R? — R be linear. Describe geometrically the possibilities for 
the null space of T. Hint: Use Exercise 22. 


The following definition is used in Exercises 24-27 and in Exercise 30. 


Definition. Let V be a vector space and W; and W2 be subspaces of 


V such that V = W, @ Wo. (Recall the definition of direct sum given in the 
exercises of Section 1.3.) A function T: V > V is called the projection on 
W, along Wo. if, for « = 4, 4+ %2 with x, € Wy and zg € We, we have 
T(z) = 21. 


24. 


Let T: R? — R?. Include figures for each of the following parts. 
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(a) Find a formula for T(a,b), where T represents the projection on 
the y-axis along the z-axis. 

(b) Find a formula for T(a,b), where T represents the projection on 
the y-axis along the line L = {(s,s): s € R}. 


Let T: R? > R?. 


(a) If T(a,b,c) = (a,b,0), show that T is the projection on the xy- 
plane along the z-axis. 

(b) Find a formula for T(a, b,c), where T represents the projection on 
the z-axis along the xy-plane. 

(c) If T(a,b,c) = (a—c,b,0), show that T is the projection on the 
xy-plane along the line L = {(a,0,a): a € R}. 


Using the notation in the definition above, assume that T: V — V is 
the projection on W; along W3. 


(a) Prove that T is linear and W, = {x € V: T(z) =z}. 
(b) Prove that W; = R(T) and Wz = N(T). 

(c) Describe T if Wi = V. 

(d) Describe T if W, is the zero subspace. 


Suppose that W is a subspace of a finite-dimensional vector space V. 


(a) Prove that there exists a subspace W’ and a function T: V > V 
such that T is a projection on W along W’. 

(b) Give an example of a subspace W of a vector space V such that 
there are two projections on W along two (distinct) subspaces. 


The following definitions are used in Exercises 28-32. 


Definitions. Let V be a vector space, and let T: V — V be linear. A 


subspace W of V is said to be T-invariant if T(x) € W for every x € W, that 
is, T(W) CW. IfW is T-invariant, we define the restriction of T on W to 
be the function Tw: W — W defined by Tw(a) = T(2) for all x € W. 


Exercises 28-32 assume that W is a subspace of a vector space V and that 
T: V — V is linear. Warning: Do not assume that W is T-invariant or that 
T is a projection unless explicitly stated. 


28. 
29. 


30. 


31. 


Prove that the subspaces {0}, V, R(T), and N(T) are all T-invariant. 
If W is T-invariant, prove that Tw is linear. 


Suppose that T is the projection on W along some subspace W’. Prove 
that W is T-invariant and that Tw = lw. 


Suppose that V = R(T) ®W and W is T-invariant. (Recall the definition 
of direct sum given in the exercises of Section 1.3.) 
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(a) Prove that W C N(T). 

(b) Show that if V is finite-dimensional, then W = N(T). 

(c) Show by example that the conclusion of (b) is not necessarily true 
if V is not finite-dimensional. 


Suppose that W is T-invariant. Prove that N(Tw) = N(T) MW and 
R(Tw) = T(W). 


Prove Theorem 2.2 for the case that (@ is infinite, that is, R(T) = 
span({T(v): uv € B}). 


Prove the following generalization of Theorem 2.6: Let V and W be 
vector spaces over a common field, and let { be a basis for V. Then for 
any function f: G — W there exists exactly one linear transformation 
T: V— W such that T(a2) = f(x) for all x € . 


Exercises 35 and 36 assume the definition of direct sum given in the exercises 
of Section 1.3. 


35. 


36. 


37. 


38. 


39. 


Let V be a finite-dimensional vector space and T: V — V be linear. 


(a) Suppose that V = R(T) +N(T). Prove that V = R(T) @ N(T). 
(b) Suppose that R(T) NA N(T) = {0}. Prove that V = R(T) @ N(T). 


Be careful to say in each part where finite-dimensionality is used. 


Let V and T be as defined in Exercise 21. 


(a) Prove that V = R(T)+N(T), but V is not a direct sum of these two 
spaces. Thus the result of Exercise 35(a) above cannot be proved 
without assuming that V is finite-dimensional. 

(b) Find a linear operator T; on V such that R(T1) N(T1) = {0} but 
V is not a direct sum of R(T,) and N(T,). Conclude that V being 
finite-dimensional is also essential in Exercise 35(b). 


A function T: V — W between vector spaces V and W is called additive 
if T(a + y) = T(x) + T(y) for all x,y € V. Prove that if V and W 
are vector spaces over the field of rational numbers, then any additive 
function from V into W is a linear transformation. 


Let T: C — C be the function defined by T(z) = Z. Prove that T is 
additive (as defined in Exercise 37) but not linear. 


Prove that there is an additive function T: R — R (as defined in Ex- 
ercise 37) that is not linear. Hint: Let V be the set of real numbers 
regarded as a vector space over the field of rational numbers. By the 
corollary to Theorem 1.13 (p. 60), V has a basis @. Let x and y be two 
distinct vectors in 3, and define f: G > V by f(x) = y, f(y) =x, and 
f(z) = z otherwise. By Exercise 34, there exists a linear transformation 
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T: V— V such that T(w) = f(u) for all u € G. Then T is additive, but 
for c= y/x, T(cx) € cT (a2). 


The following exercise requires familiarity with the definition of quotient space 
given in Exercise 31 of Section 1.3. 


40. Let V be a vector space and W be a subspace of V. Define the mapping 
nn: V > V/W by n(v) = uv + W for ve V. 


(a) Prove that 7 is a linear transformation from V onto V/W and that 
N(7) = W. 

(b) Suppose that V is finite-dimensional. Use (a) and the dimen- 
sion theorem to derive a formula relating dim(V), dim(W), and 
dim(V/W). 

(c) Read the proof of the dimension theorem. Compare the method of 
solving (b) with the method of deriving the same result as outlined 
in Exercise 35 of Section 1.6. 


2.2. THE MATRIX REPRESENTATION OF A LINEAR 
TRANSFORMATION 


Until now, we have studied linear transformations by examining their ranges 
and null spaces. In this section, we embark on one of the most useful ap- 
proaches to the analysis of a linear transformation on a finite-dimensional 
vector space: the representation of a linear transformation by a matrix. In 
fact, we develop a one-to-one correspondence between matrices and linear 
transformations that allows us to utilize properties of one to study properties 
of the other. 
We first need the concept of an ordered basis for a vector space. 


Definition. Let V be a finite-dimensional vector space. An ordered 
basis for V is a basis for V endowed with a specific order; that is, an ordered 
basis for V is a finite sequence of linearly independent vectors in V that 
generates V. 


Example 1 


In F?, 8 = {e1,e2,e3} can be considered an ordered basis. Also y = 
{e,e1,e3} is an ordered basis, but @ 4 y as ordered bases. 


For the vector space F”, we call {e1,e2,...,en} the standard ordered 
basis for F”. Similarly, for the vector space P,,(F’), we call {1,z,...,2”} the 
standard ordered basis for P,,(F’). 

Now that we have the concept of ordered basis, we can identify abstract 
vectors in an n-dimensional vector space with n-tuples. This identification is 
provided through the use of coordinate vectors, as introduced next. 
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Definition. Let 6 = {u,,u2,...,Un} be an ordered basis for a finite- 
dimensional vector space V. For x € V, let a,,a2,...,@n be the unique scalars 
such that 


n 
oe ; ayUj.- 
i=1 


We define the coordinate vector of x relative to 3, denoted [x]g, by 


Notice that [u;]g = e; in the preceding definition. It is left as an exercise 
to show that the correspondence x — []g provides us with a linear transfor- 
mation from V to F”. We study this transformation in Section 2.4 in more 
detail. 


Example 2 


Let V = P2(R), and let 3 = {1, x, x7} be the standard ordered basis for V. If 
f(x) = 4+ 6x — 72”, then 


Let us now proceed with the promised matrix representation of a linear 
transformation. Suppose that V and W are finite-dimensional vector spaces 
with ordered bases 3 = {v1,v2,...,Un} and y = {w1,We,... ,Wm}, respec- 
tively. Let T: V — W be linear. Then for each 7, 1 < 7 < n, there exist 
unique scalars a,j € F, 1 <i <m, such that 


T(v;) = So aij wi forl<j<n. 


i=l 


Definition. Using the notation above, we call them xn matrix A defined 
by Aj; = a; the matrix representation of T in the ordered bases (3 
and ¥ and write A= [T]}. If V = W and § = 7, then we write A = [T]g. 


Notice that the jth column of A is simply [T'(v;)],. Also observe that if 


U: V + W is a linear transformation such that [U]; = [T]3, then U = T by 
the corollary to Theorem 2.6 (p. 73). 
We illustrate the computation of [T]3 in the next several examples. 
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Example 3 
Let T: R? — R? be the linear transformation defined by 


T(a1, az) = (ay + 3a2,0, 2a; — 4ag). 
Let 3 and y be the standard ordered bases for R? and R°, respectively. Now 


T(1,0) = (1,0, 2) = lei + 0e2 + 2e3 


and 
T(0,1) = (3,0,—4) = 3e1 + Oeg — 4es. 
Hence 
1 3 
[T}3 =[{0 0 
2 —4 
If we let 7’ = {e3, e2,e1}, then 
, foe 
[T]3 =| 0 4 
1 3 


Example 4 


Let T: P3(R) — P2(R) be the linear transformation defined by T(f(z)) = 
f'(a). Let @ and y be the standard ordered bases for P3(R) and P2(R), 
respectively. Then 


T(1) =0-1+0-2+0-2 

T(x) =1-14+0-2+0-2? 
T(x?) =0-142-%40-2? 
T(a3) =0-1+0-%43-2?. 


So 


= 
2 

ooo 

oo F 
i) 
So 


0 3 


Note that when T(/) is written as a linear combination of the vectors of 4, 
its coefficients give the entries of the jth column of [T]}. 
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Now that we have defined a procedure for associating matrices with linear 
transformations, we show in Theorem 2.8 that this association “preserves” 
addition and scalar multiplication. To make this more explicit, we need some 
preliminary discussion about the addition and scalar multiplication of linear 
transformations. 


Definition. Let T,U: V — W be arbitrary functions, where V and W 
are vector spaces over F’, and let a € F. We define T+ U: V — W by 
(T + U)(x) = T(a#) + U(a) for all x € V, and aT: V > W by (aT)(a) = aT(a) 
for alla EV. 

Of course, these are just the usual definitions of addition and scalar mul- 


tiplication of functions. We are fortunate, however, to have the result that 
both sums and scalar multiples of linear transformations are also linear. 


Theorem 2.7. Let V and W be vector spaces over a field F, and let 
T,U: V > W be linear. 
(a) For alla € F, aT + U is linear. 
(b) Using the operations of addition and scalar multiplication in the pre- 
ceding definition, the collection of all linear transformations from V to 
W is a vector space over F’. 


Proof. (a) Let z,y € V andcé F. Then 


(aT + U)(ca + y) = aT(ca + y) + U(cx + y) 
= alT(cx + y)| + cU(x) + U(y) 
= alcT(x) + T(y)] + cU(x) + U(y) 
= acT(x) + cU(x) + aT(y) + U(y) 
= c(aT + U)(x) + (aT + U)(y). 


So aT + U is linear. 

(b) Noting that To, the zero transformation, plays the role of the zero 
vector, it is easy to verify that the axioms of a vector space are satisfied, 
and hence that the collection of all linear transformations from V into W is a 
vector space over F’. a 


Definitions. Let V and W be vector spaces over F. We denote the 
vector space of all linear transformations from V into W by £L(V,W). In the 
case that V = W, we write L(V) instead of £(V,W). 


In Section 2.4, we see a complete identification of £(V,W) with the vector 
space My, xn(F'), where n and m are the dimensions of V and W, respectively. 
This identification is easily established by the use of the next theorem. 


Theorem 2.8. Let V and W be finite-dimensional vector spaces with 
ordered bases 3 and y, respectively, and let T, U: V — W be linear transfor- 
mations. Then 
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(a) [T+U]3 = [T]3 + [U]3 and 
(b) [aT] = a[T];, for all scalars a. 


Proof. Let @ = {v1,v2,...,Un} and y = {w1,we,...,Wm}. There exist 
unique scalars a;; and bj; (1 <i<m,1<j <n) such that 


T(v;) = So aijwi and U(v;) = So bijwi for 1 <j < n. 


i=1 i=1 

Hence 
(T +.U)(vj) = S (aig + dig )wi- 
i=1 

Thus 

([T + U]f)ig = aig + big = ([T]5 + (U]3)as- 
So (a) is proved, and the proof of (b) is similar. | 
Example 5 


Let T: R? — R® and U: R? — R? be the linear transformations respectively 
defined by 


T(a@1, 42) = (a, + 3a2,0, 2a, — 4az) and U(ay, a2) = (a1 — a2, 2a1, 3a, + 2az). 


Let 3 and y be the standard ordered bases of R? and R®, respectively. Then 


1 3 
[T]z3=[0 Of], 
2 —4 
(as computed in Example 3), and 
1 -1 
[UZ=|2 0 
3. 2 


If we compute T + U using the preceding definitions, we obtain 


(T + U) (a1, az) = (2a, + 2a2, 241, day — 2a2). 


So 
2 2 
[T+ U],= 2 0}, 
5 —2 


which is simply [T]} + [U]j, illustrating Theorem 2.8. 
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EXERCISES 


Label the following statements as true or false. Assume that V and 
W are finite-dimensional vector spaces with ordered bases @ and 4, 
respectively, and T,U: V — W are linear transformations. 


(a) For any scalar a, aT + U is a linear transformation from V to W. 
(b) [T]3 = [U]} implies that T = U. 

(c) If m=dim(V) and n= dim(W), then [T]; is an m x n matrix. 
(d) [T+U]3 =[T]3 + [UJ5. 

(e) L(V,W) is a vector space. 

(f) L(V,W) = L(W,V). 


Let @ and y be the standard ordered bases for R” and R™, respectively. 


For each linear transformation T: R” — R™, compute [T]3. 


(a) T: R? — R? defined by T(a1, a2) = (2a1 — a2, 3a; + 4az, a1). 
(b) T: R® — R? defined by T(a1, a2, a3) = (2a1 + 3a2 — a3, a1 + 43). 
(c) T: R® > R defined by T(a1, a2, a3) = 2a) + a2 — 3a3. 

(d) T: R® — R® defined by 


T(a1, @2, a3) = (2a t a3, aA, t 4ag t 5ag3, a1 + a3). 


(e) T: R® — R” defined by T(a1,a2,... ,@n) = (@1,@1,... , G1). 
(f) T: R” — R” defined by T(a1,a2,... ,@n) = (Qn, @n—1,--- , 1). 
(g) T: R” > R defined by T(a1, a2,... ,@n) = G1 + Gn. 


Let T: R? — R? be defined by T(a1, a2) = (a1 — a2, a1, 2a; +. a2). Let 3 
be the standard ordered basis for R? and y = {(1, 1,0), (0,1, 1), (2,2,3)}. 
Compute [T]?. If a = {(1,2), (2,3)}, compute [T]2. 

Define 


a Ob 


T: Mox2(R) > P2(R) by T é D = (a+b) + (2d)x + br. 


Let 


(6 9G IE IG DF mt ote 


Compute [T]3. 


={6.0)-6 9-6 9)-G 9} 


B — {Desa}, 


Let 


and 


y= {1}. 
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8.7 


10. 


11. 


(a) Define T: Moyo(F) > Moxo(F) by T(A) = A‘. Compute [T],. 
(b) Define 


T: Po(R) > Mox2(R) by T(f(x)) = on ina) 


where ' denotes differentiation. Compute [T]3. 
(c) Define T: Moy2(£) — F by T(A) = tr(A). Compute [T]7. 
(d) Define T: Po(R) > R by T(f(x)) = f(2). Compute [T]3- 
(e) If 


compute [A],. 
(f) If f(x) =3-—6xr +27, compute [f(x)],. 
(g) For a € F, compute [a],. 


Complete the proof of part (b) of Theorem 2.7. 
Prove part (b) of Theorem 2.8. 


Let V be an n-dimensional vector space with an ordered basis 3. Define 
T: V— F” by T(x) = [a]g. Prove that T is linear. 


Let V be the vector space of complex numbers over the field R. Define 
T: V — V by T(z) = Z, where Z is the complex conjugate of z. Prove 
that T is linear, and compute [T]g, where @ = {1,i}. (Recall by Exer- 
cise 38 of Section 2.1 that T is not linear if V is regarded as a vector 
space over the field C.) 


Let V be a vector space with the ordered basis @ = {v1,v2,...,Un}. 
Define v9 = 0. By Theorem 2.6 (p. 72), there exists a linear trans- 
formation T: V — V such that T(v;) = vj; + vj—-1 for j = 1,2,...,n. 
Compute [T],. 


Let V be an n-dimensional vector space, and let T: V — V be a linear 
transformation. Suppose that W is a T-invariant subspace of V (see the 
exercises of Section 2.1) having dimension &. Show that there is a basis 
@ for V such that [T]g has the form 


(6 ¢) 


where A is a & x k matrix and O is the (n — k) x k zero matrix. 
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12. Let V be a finite-dimensional vector space and T be the projection on 
W along W’, where W and W’ are subspaces of V. (See the definition 
in the exercises of Section 2.1 on page 76.) Find an ordered basis (3 for 
V such that [T]g is a diagonal matrix. 


13. Let V and W be vector spaces, and let T and U be nonzero linear 
transformations from V into W. If R(T) M R(U) = {0}, prove that 
{T,U} is a linearly independent subset of £(V,W). 


14. Let V = P(R), and for j > 1 define T;(f(x)) = f(x), where f(z) 
is the jth derivative of f(x). Prove that the set {T1,T2,...,Tn} isa 
linearly independent subset of £(V) for any positive integer n. 


15. Let V and W be vector spaces, and let S be a subset of V. Define 
S° = {T € L(V,W): T(x) = 0 for all z € S}. Prove the following 
statements. 

(a) S° is a subspace of £(V,W). 
(b) If Sjand S> are subsets of V and S$; C So, then S$ C S?. 
(c) If Vi and V2 are subspaces of V, then (V1 + V2)° = V2N V9. 


16. Let V and W be vector spaces such that dim(V) = dim(W), and let 
T: V— W be linear. Show that there exist ordered bases 3 and y for 
V and W, respectively, such that [T]3 is a diagonal matrix. 


2.3. COMPOSITION OF LINEAR TRANSFORMATIONS 
AND MATRIX MULTIPLICATION 


In Section 2.2, we learned how to associate a matrix with a linear transforma- 
tion in such a way that both sums and scalar multiples of matrices are associ- 
ated with the corresponding sums and scalar multiples of the transformations. 
The question now arises as to how the matrix representation of a composite 
of linear transformations is related to the matrix representation of each of the 
associated linear transformations. The attempt to answer this question leads 
to a definition of matrix multiplication. We use the more convenient notation 
of UT rather than Uo T for the composite of linear transformations U and T. 
(See Appendix B.) 

Our first result shows that the composite of linear transformations is lin- 
ear. 


Theorem 2.9. Let V, W, and Z be vector spaces over the same field F, 
and let T: V— W and U: W — Z be linear. Then UT: V = Z is linear. 


Proof. Let x,y € V anda eé F. Then 
UT(axz + y) = U(T(ax + y)) = U(aT(x) + T(y)) 
= aU(T(x)) + U(T(y)) = a(UT)(a) + UT(y). | 
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The following theorem lists some of the properties of the composition of 
linear transformations. 


Theorem 2.10. Let V be a vector space. Let T,U,,U2 € L(V). Then 
) T(Uy + Uz) = TU; + TUe and (Uy + Ug)T = Uy T + UsT 

) T(U,U2) = (TU1)U2 

) Th=IT=T 
) 


Proof. Exercise. | 


A more general result holds for linear transformations that have domains 
unequal to their codomains. (See Exercise 8.) 

Let T: V— W and U: W — Z be linear transformations, and let A = [U]} 
and B = [T]8, where a = {v1,v2,...,Un}, 6 = {w1,We,...,Wm}, and y = 
{21, 22,--+,%p} are ordered bases for V, W, and Z, respectively. We would 
like to define the product AB of two matrices so that AB = [UT]%. Consider 
the matrix [UT]2. For 1 < 7 <n, we have 


where 
Cig = > Aw Buy. 
k=1 


This computation motivates the following definition of matrix multiplication. 


Definition. Let A be an m x n matrix and B be an n x p matrix. We 
define the product of A and B, denoted AB, to be the m x p matrix such 
that 


(AB);; = S > AeBrj for 1 < 1 < Mm, 1 <j < ~p. 
k=1 


Note that (AB);; is the sum of products of corresponding entries from the 
ith row of A and the jth column of B. Some interesting applications of this 
definition are presented at the end of this section. 
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The reader should observe that in order for the product AB to be defined, 
there are restrictions regarding the relative sizes of A and B. The following 
mnemonic device is helpful: “(m x n)-(n x p) = (m x p)”; that is, in order 
for the product AB to be defined, the two “inner” dimensions must be equal, 
and the two “outer” dimensions yield the size of the product. 


Example 1 
We have 


Lg Zl ; — f 1-442-241-5 \ (13 
0 4 -1 ~ \0-44+4-24+(-1)-5) \ 3) 
Notice again the symbolic relationship (2 x 3)-(3x 1)=2x1l. 


As in the case with composition of functions, we have that matrix multi- 
plication is not commutative. Consider the following two products: 


Ly fO A) ft Ty gg (0 TV fl 1) _ 0. 0 
Os OPN Op ORO Oe EP Na OE NO Oh = Ra ey 
Hence we see that even if both of the matrix products AB and BA are defined, 
it need not be true that AB = BA. 
Recalling the definition of the transpose of a matrix from Section 1.3, we 


show that if Ais an mxn matrix and B is an nxp matrix, then (AB)' = BtAt. 
Since 


(AB); = (AB) ji = S- AjrBri 
I 


and 


(B A); = 57 (B*) Jin ( => Bul jk 


we are finished. Therefore the transpose of a product is the product of the 
transposes in the opposite order. 

The next theorem is an immediate consequence of our definition of matrix 
multiplication. 


Theorem 2.11. Let V, W, and Z be finite-dimensional vector spaces with 
ordered bases a, 3, and y, respectively. Let T: V — W and U: W — Z be 
linear transformations. Then 


[UT]2, = [UJZ (Te. 
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Corollary. Let V be a finite-dimensional vector space with an ordered 
basis 8. Let T,U € L(V). Then [UT]g = [U]s[T]a. 
We illustrate Theorem 2.11 in the next example. 


Example 2 


Let U: P3(R) > Po2(R) and T: P2(R) — P3(R) be the linear transformations 
respectively defined by 


U(F@)) = fe) and =“ T(F(@)) = is " f(t) dt. 


Let a and @ be the standard ordered bases of P3(R) and P2(R), respectively. 
From calculus, it follows that UT = |, the identity transformation on P2(R). 
To illustrate Theorem 2.11, observe that 


00 0 
0 20: OVA “00 ian ae) 

[UT]e =[U]ZITI¢=]0 0 2 Of Jo 2 of =[0 1 O)=[le. @ 
DAO Nes cg 2 00 1 


The preceding 3 x 3 diagonal matrix is called an identity matrix and is 
defined next, along with a very useful notation, the Kronecker delta. 


Definitions. We define the Kronecker delta 6;; by 6;; = 1 ifi = 3 and 
6; =O ifiAJ. Then x n identity matrix I, is defined by (In)ij = 5i;- 


Thus, for example, 


1 0 1 0 0 
I, = (1), Ip = ({ ) 5 and Ts = 0 1 0 
0 0 1 


The next theorem provides analogs of (a), (c), and (d) of Theorem 2.10. 
Theorem 2.10(b) has its analog in Theorem 2.16. Observe also that part (c) of 
the next theorem illustrates that the identity matrix acts as a multiplicative 
identity in Mnxn(F). When the context is clear, we sometimes omit the 
subscript n from I. 


Theorem 2.12. Let A be anm x n matrix, B and C be n x p matrices, 
and D and E be q x m matrices. Then 


InA=A= AI). 


) 
(b) a(AB) = (aA)B = A(aB) for any scalar a. 
) 
) If V is an n-dimensional vector space with an ordered basis 3, then 


[Iv]o = In- 
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Proof. We prove the first half of (a) and (c) and leave the remaining proofs 
as an exercise. (See Exercise 5.) 


(a) We have 
[A(B + C)]ig = 0 Aie(B + C)ag = D5 Asn Brg + Crs) 
k=l k=1 
= S "(Aik Br + AipCuj) = > Aix Bry + S- Ain Cr; 
k=1 k=l k=l 


So A(B+C)=AB+ AC. 
(c) We have 


(UmA)iz = S 7 Um) ik Ag = » bik Any = Aaj. | 


k=1 k=1 
Corollary. Let A be anm x n matrix, By, Bz2,..., By ben x p matrices, 
C1, C2,...,Ck be q x m matrices, and a1, a2,...,a@, be scalars. Then 


k k 
i=1 i=1 


and 


k k 
i=1 i=1 


Proof. Exercise. | 


For an n x n matrix A, we define A! = A, A? = AA, A® = AA, and, in 
general, AY = A*-1A for k = 2,3,.... We define A° = I. 
With this notation, we see that if 


0 0 
4=(1 a): 
then A? = O (the zero matrix) even though A 4 O. Thus the cancellation 
property for multiplication in fields is not valid for matrices. To see why, 
assume that the cancellation law is valid. Then, from A-A = A? = O= A-O, 
we would conclude that A = O, which is false. 


Theorem 2.13. Let A be an m x n matrix and B be an n x p matrix. 
For each j (1 < j <p) let u; and v; denote the jth columns of AB and B, 
respectively. Then 
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(a) Uj = Av; 
(b) v; = Be;, where e; is the jth standard vector of F?. 


Proof. (a) We have 


B)2; Agr. Brj Bo; 
Uj => . = k=l = Av; 
(AB) mj - . Bnj 
> Amn Bry 
het 


Hence (a) is proved. The proof of (b) is left as an exercise. (See Exercise 6.) 


It follows (see Exercise 14) from Theorem 2.13 that column j of AB is 
a linear combination of the columns of A with the coefficients in the linear 
combination being the entries of column j of B. An analogous result holds 
for rows; that is, row i of AB is a linear combination of the rows of B with 
the coefficients in the linear combination being the entries of row i of A. 

The next result justifies much of our past work. It utilizes both the matrix 
representation of a linear transformation and matrix multiplication in order 
to evaluate the transformation at any given vector. 


Theorem 2.14. Let V and W be finite-dimensional vector spaces having 
ordered bases 3 and y, respectively, and let T: V — W be linear. Then, for 
each u € V, we have 


[T(u)]y = Malule- 

Proof. Fix u € V, and define the linear transformations f: F — V by 
f(a) = au and g: F > W by g(a) = aT(u) for alla € F. Let a = {1} be 
the standard ordered basis for F. Notice that g = Tf. Identifying column 
vectors as matrices and using Theorem 2.11, we obtain 


[T(w)]y = [9 Dhy = (gle = (T= BUA = TB De = Ble. =o 
Example 3 


Let T: P3(R) — Po2(R) be the linear transformation defined by T(f(x)) = 
f’(a), and let 6 and ¥ be the standard ordered bases for P3(R) and P2(R), 
respectively. If A = [T]3; then, from Example 4 of Section 2.2, we have 


0 1 0 0 
A= {0 0 2 0 
00 0 3 
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We illustrate Theorem 2.14 by verifying that [T(p(2))], = [T] 3 [P(2)] a; where 
p(x) € P3(R) is the polynomial p(x) = 2—4a2+2?+ 32°. Let q(x) = T(p(z)); 
then q(x) = p’(x) = —44+ 2x + 9x”. Hence 


—4 
[T(p(z))]y =la(2)Jy= | 2], 
9 
but also 
0 1 0 0 a —4 
[T]5[p(x)le = Alp(z)]la =] 0 0 2 0 ij=| 2 + 
00 0 8 3 9 


We complete this section with the introduction of the left-multiplication 
transformation L4, where A is anmxn matrix. This transformation is proba- 
bly the most important tool for transferring properties about transformations 
to analogous properties about matrices and vice versa. For example, we use 
it to prove that matrix multiplication is associative. 


Definition. Let A be an m x n matrix with entries from a field F. 
We denote by La the mapping L4: F” — F™ defined by La(x%) = Ax (the 
matrix product of A and x) for each column vector x € F”. We call La a 
left-multiplication transformation. 

Example 4 
Let 


(ae a 
A=(j 1 5, 


Then A € Mox3(R) and Ly: R® > R?. If 
then 


We see in the next theorem that not only is L, linear, but, in fact, it has 
a great many other useful properties. These properties are all quite natural 
and so are easy to remember. 
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Theorem 2.15. Let A be an m x n matrix with entries from F’. Then 
the left-multiplication transformation L4: F” — F™ is linear. Furthermore, 
if B is any other m x n matrix (with entries from F') and 6 and vy are the 
standard ordered bases for F” and F™, respectively, then we have the following 
properties. 

(a) [Lal} = 4. 
(b) L4 = Lg if and only if A= B. 
(c) Layep =La+Le and Las = aly for allac F. 
) IfT: F" — F™ is linear, then there exists a unique m x n matrix C such 
that T = Le. In fact, C = [T]3. 
(e) If E is ann x p matrix, then Lag = LaLe. 
(f) Ifm =n, then Ly, = Irn. 


Proof. The fact that Ly is linear follows immediately from Theorem 2.12. 

(a) The jth column of [L4]3 is equal to La(e;). However L4(e;) = Aej, 
which is also the jth column of A by Theorem 2.13(b). So [La]} = A. 

(b) If L4 = Lg, then we may use (a) to write A = [L4]} = [La]} = B. 
Hence A = B. The proof of the converse is trivial. 

(c) The proof is left as an exercise. (See Exercise 7.) 

(d) Let C = [T]}. By Theorem 2.14, we have [T(x)], = [T]3la], or 
T(a) = Ca =Le(2) for all x € F”. So T =Lc. The uniqueness of C’ follows 
from (b). 

(e) For any j (1 < 7 < p), we may apply Theorem 2.13 several times to 
note that (AEF )e; is the jth column of AEF and that the jth column of AE is 
also equal to A(Ee;). So (AE)e; = A(Ee;). Thus 


Lan(e;) = (AF )e; = A(Fe;) = La(Fe;) = La(La(e;)). 


Hence Lag = LaLg by the corollary to Theorem 2.6 (p. 73). 
(f) The proof is left as an exercise. (See Exercise 7.) | 


We now use left-multiplication transformations to establish the associa- 
tivity of matrix multiplication. 


Theorem 2.16. Let A,B, and C be matrices such that A(BC) is de- 
fined. Then (AB)C is also defined and A(BC) = (AB)C; that is, matrix 


multiplication is associative. 


Proof. It is left to the reader to show that (AB)C is defined. Using (e) 
of Theorem 2.15 and the associativity of functional composition (see Ap- 
pendix B), we have 


La(ec) = LaLec = La(Lelc) = (LaLa)bo = Laglo = Lapyc. 


So from (b) of Theorem 2.15, it follows that A(BC) = (AB)C. | 
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Needless to say, this theorem could be proved directly from the definition 
of matrix multiplication (see Exercise 18). The proof above, however, provides 
a prototype of many of the arguments that utilize the relationships between 
linear transformations and matrices. 


Applications 


A large and varied collection of interesting applications arises in connec- 
tion with special matrices called incidence matrices. An incidence matrix 
is a square matrix in which all the entries are either zero or one and, for 
convenience, all the diagonal entries are zero. If we have a relationship on a 
set of n objects that we denote by 1,2,...,n, then we define the associated 
incidence matrix A by A;; = 1 if i is related to j, and Ajj = 0 otherwise. 

To make things concrete, suppose that we have four people, each of whom 
owns a communication device. If the relationship on this group is “can trans- 
mit to,” then A;; = 1 if i can send a message to j, and Aj; = 0 otherwise. 
Suppose that 


aN 
lI 
HF Or © 


1 
0 
1 
1 


ooo oO 
orro 


Then since A34 = 1 and Aj4 = 0, we see that person 3 can send to 4 but 1 
cannot send to 4. 

We obtain an interesting interpretation of the entries of A?. Consider, for 
instance, 


(A”) 31 = Az, Ai) + AgoAa1 + A33A31 + Aga Ai. 


Note that any term A3,Ax1 equals 1 if and only if both As, and Az, equal 1, 
that is, if and only if 3 can send to k and k can send to 1. Thus (A?)3; gives 
the number of ways in which 3 can send to 1 in two stages (or in one relay). 
Since 


A= 


PNR Fr 
lle Oe) 
OS Oreos 
PROF 


we see that there are two ways 3 can send to 1 in two stages. In general, 
(A+ A? +--+. +A™);; is the number of ways in which i can send to j in at 
most m stages. 

A maximal collection of three or more people with the property that any 
two can send to each other is called a clique. The problem of determining 
cliques is difficult, but there is a simple method for determining if someone 
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belongs to a clique. If we define a new matrix B by B;; = 1ifi and j can send 
to each other, and B;; = 0 otherwise, then it can be shown (see Exercise 19) 
that person i belongs to a clique if and only if (B%);; > 0. For example, 
suppose that the incidence matrix associated with some relationship is 


0 


1 
0 
oz 1 
1 


ee 


1 
0 
1 
0 


are 


To determine which people belong to cliques, we form the matrix B, described 
earlier, and compute B?. In this case, 


0101 0404 
_{1 01.0 ee 0 0 
Boi so: 2), S2e S| h ao, a 

1010 4040 


Since all the diagonal entries of B® are zero, we conclude that there are no 
cliques in this relationship. 

Our final example of the use of incidence matrices is concerned with the 
concept of dominance. A relation among a group of people is called a dom- 
inance relation if the associated incidence matrix A has the property that 
for all distinct pairs i and j, Aj; = 1 if and only if Aj; = 0, that is, given 
any two people, exactly one of them dominates (or, using the terminology of 
our first example, can send a message to) the other. Since A is an incidence 
matrix, A;; = 0 for all7. For such a relation, it can be shown (see Exercise 21) 
that the matrix A+ A? has a row [column] in which each entry is positive 
except for the diagonal entry. In other words, there is at least one person 
who dominates [is dominated by] all others in one or two stages. In fact, it 
can be shown that any person who dominates [is dominated by] the greatest 
number of people in the first stage has this property. Consider, for example, 
the matrix 


aN 

lI 
Pe Oe oo 
FrOoOOoOrF 
He oor © 
eH oO oO So 


0 0 


The reader should verify that this matrix corresponds to a dominance relation. 
Now 


A+A? = 


NrFrF Fr Oo 
OR Oe ee) 
NON Orr 
NOOoOnNWr Fr 
Corr OF 
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Thus persons 1, 3, 4, and 5 dominate (can send messages to) all the others 
in at most two stages, while persons 1, 2, 3, and 4 are dominated by (can 
receive messages from) all the others in at most two stages. 


EXERCISES 


1. Label the following statements as true or false. In each part, V,W, 
and Z denote vector spaces with ordered (finite) bases a, 3, and 4, 
respectively; T: V — W and U: W — Z denote linear transformations; 
and A and B denote matrices. 

(a) [UT]3 = [T]E[UI 3. 

(b) [T(v)]¢ = [T]8[v]a for all v € V. 

(c) [U(w)]¢ = [U]2[w], for all w € W. 

(d) [vl =I. 

(e) [T7]6 = ((7]6)?. 

(f) A? =TJ implies that A=J or A= —I. 

(g) T =Lza for some matrix A. 

(h) A? =O implies that A = O, where O denotes the zero matrix. 


G) Lase =lLa+Le. 
j) If Ais square and A;; = 6;; for all 7 and j, then A = J. 
J j J 


i 3 10 -3 
A=(5 a B= (4 4 ae 


2. (a) Let 


2 
1 1 4 
eats _9 ae and D= | -2 
3 
Compute A(2B + 3C),(AB)D, and A(BD). 
(b) Let 
2 5 3 -2 0 
A=[-3 1], B=/1 -1 4], and C=(4 0 3). 
4 2 5 5. 3 


Compute A’, A'B, BCt, CB, and CA. 


3. Let g(x) =3+2. Let T: Po(R) — Po(R) and U: P2(R) > R?® be the 
linear transformations respectively defined by 


T(f(x)) = f'(z)g(xz) + 2f(xz) and U(a+ br + cx”) = (a+b,c,a—b). 


Let 3 and y be the standard ordered bases of P2(R) and R®, respectively. 
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Oe oe 


10. 


11. 


12. 


13. 


(a) Compute [UJ},[T]s, and [UT]} directly. Then use Theorem 2.11 
to verify your result. 

(b) Let h(x) = 3-22 +27. Compute [h(x)]g and [U(A(z))]y. Then 
use [U]3 from (a) and Theorem 2.14 to verify your result. 


For each of the following parts, let T be the linear transformation defined 
in the corresponding part of Exercise 5 of Section 2.2. Use Theorem 2.14 
to compute the following vectors: 


(a) [T(A)]q, where A = é& a) 

(b) [T(f(x)\la, where f(x) = 4 — 6a + 302. 
(c) [T(A)],, where A = G a 

(d) [T(f(2))]y, where f(x) = 6 — a + 227. 


Complete the proof of Theorem 2.12 and its corollary. 
Prove (b) of Theorem 2.13. 
Prove (c) and (f) of Theorem 2.15. 


Prove Theorem 2.10. Now state and prove a more general result involv- 
ing linear transformations with domains unequal to their codomains. 


Find linear transformations U, T: F? — F? such that UT = To (the zero 
transformation) but TU 4 Ty. Use your answer to find matrices A and 
B such that AB = O but BAZ O. 


Let A be an n x n matrix. Prove that A is a diagonal matrix if and 
only if Ai; = bij Ai for all 7 and j. 


Let V be a vector space, and let T: V — V be linear. Prove that T? = To 
if and only if R(T) C N(T). 

Let V, W, and Z be vector spaces, and let T: V — W and U: W = Z 
be linear. 


(a) Prove that if UT is one-to-one, then T is one-to-one. Must U also 
be one-to-one? 

(b) Prove that if UT is onto, then U is onto. Must T also be onto? 

(c) Prove that if U and T are one-to-one and onto, then UT is also. 


Let A and B be n xX n matrices. Recall that the trace of A is defined 
by 


Prove that tr(AB) = tr(BA) and tr(A) = tr(A‘). 
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14. 
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Assume the notation in Theorem 2.13. 


(a) Suppose that z is a (column) vector in F?. Use Theorem 2.13(b) 
to prove that Bz is a linear combination of the columns of B. In 
particular, if z = (a1,a2,...,a,)*, then show that 


Pp 
Bz= y ajU;. 
j=1 


(b) Extend (a) to prove that column j of AB is a linear combination 
of the columns of A with the coefficients in the linear combination 
being the entries of column j of B. 

(c) For any row vector w € F™, prove that wA is a linear combination 
of the rows of A with the coefficients in the linear combination 
being the coordinates of w. Hint: Use properties of the transpose 
operation applied to (a). 

(d) Prove the analogous result to (b) about rows: Row 7 of AB isa 
linear combination of the rows of B with the coefficients in the 
linear combination being the entries of row i of A. 


15.1 Let M and A be matrices for which the product matrix MA is defined. 


16. 


17. 


18. 


19. 


20. 


If the jth column of A is a linear combination of a set of columns 
of A, prove that the jth column of MA is a linear combination of the 
corresponding columns of M/A with the same corresponding coefficients. 


Let V be a finite-dimensional vector space, and let T: V — V be linear. 


(a) If rank(T) = rank(T?), prove that R(T) N N(T) = {0}. Deduce 
that V = R(T) ® N(T) (see the exercises of Section 1.3). 
(b) Prove that V=R(T*) @ N(T*) for some positive integer k. 


Let V be a vector space. Determine all linear transformations T: V — V 
such that T = T?. Hint: Note that 2 = T(x) + (2 — T(x)) for every 
x in V, and show that V = {y: T(y) = y} ® N(T) (see the exercises of 
Section 1.3). 


Using only the definition of matrix multiplication, prove that multipli- 
cation of matrices is associative. 


For an incidence matrix A with related matrix B defined by Bi; = 1 if 
zis related to j and 7 is related to 7, and B,; = 0 otherwise, prove that 
i belongs to a clique if and only if (B%);; > 0. 


Use Exercise 19 to determine the cliques in the relations corresponding 
to the following incidence matrices. 
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Pr r © 
SiS Ore: 
eo: & 
Corr Rr 


0 1 1 
1 O 0 
Gyles all 2b) 
1 O 0 
21. Let A be an incidence matrix that is associated with a dominance rela- 


tion. Prove that the matrix A+ A? has a row [column] in which each 
entry is positive except for the diagonal entry. 


22. Prove that the matrix 


0 1 
A=|{0 0 
1 0 


Oat 


corresponds to a dominance relation. Use Exercise 21 to determine 
which persons dominate [are dominated by] each of the others within 
two stages. 


23. Let A be an n x n incidence matrix that corresponds to a dominance 
relation. Determine the number of nonzero entries of A. 


2.4 INVERTIBILITY AND ISOMORPHISMS 


The concept of invertibility is introduced quite early in the study of functions. 
Fortunately, many of the intrinsic properties of functions are shared by their 
inverses. For example, in calculus we learn that the properties of being con- 
tinuous or differentiable are generally retained by the inverse functions. We 
see in this section (Theorem 2.17) that the inverse of a linear transformation 
is also linear. This result greatly aids us in the study of inverses of matrices. 
As one might expect from Section 2.3, the inverse of the left-multiplication 
transformation L4 (when it exists) can be used to determine properties of the 
inverse of the matrix A. 

In the remainder of this section, we apply many of the results about in- 
vertibility to the concept of isomorphism. We will see that finite-dimensional 
vector spaces (over F’) of equal dimension may be identified. These ideas will 
be made precise shortly. 

The facts about inverse functions presented in Appendix B are, of course, 
true for linear transformations. Nevertheless, we repeat some of the defini- 
tions for use in this section. 


Definition. Let V and W be vector spaces, and let T: V — W be linear. 
A function U: W — V is said to be an inverse of T if TU = lw and UT = ly. 
If T has an inverse, then T is said to be invertible. As noted in Appendix B, 
if T is invertible, then the inverse of T is unique and is denoted by T!. 
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The following facts hold for invertible functions T and U. 


1 (TU)? SUSE? 
2. (T~+)-! =T; in particular, T~! is invertible. 


We often use the fact that a function is invertible if and only if it is both 
one-to-one and onto. We can therefore restate Theorem 2.5 as follows. 


3. Let T: V — W be a linear transformation, where V and W are finite- 
dimensional spaces of equal dimension. Then T is invertible if and only 
if rank(T) = dim(V). 


Example 1 


Let T: Pi(R) — R? be the linear transformation defined by T(a + br) = 
(a,a+b). The reader can verify directly that T~': R? > P,(R) is defined by 
T-1(c,d) =e +(d—c)x. Observe that T~' is also linear. As Theorem 2.17 
demonstrates, this is true in general. 4 


Theorem 2.17. Let V and W be vector spaces, and let T: V — W be 
linear and invertible. Then T~!: W — V is linear. 


Proof. Let y1,y2 € W and c € F. Since T is onto and one-to-one, there 
exist unique vectors x; and x2 such that T(a1) = y, and T(a2) = yo. Thus 
21 = T~'(y1) and x2 = T—1(y2); so 
T~ [eT (#1) + T(x2)] = T7*[T (ex, + 22)| 
cry + to = cT~+(y,) + T+ (ye). | 


TO (cy + y2) 


It now follows immediately from Theorem 2.5 (p. 71) that if T is a linear 
transformation between vector spaces of equal (finite) dimension, then the 
conditions of being invertible, one-to-one, and onto are all equivalent. 

We are now ready to define the inverse of a matrix. The reader should 
note the analogy with the inverse of a linear transformation. 


Definition. Let A be ann XxX n matrix. Then A is invertible if there 
exists ann X n matrix B such that AB = BA=TI. 


If A is invertible, then the matrix B such that AB = BA = T is unique. (If 
C were another such matrix, then C = CI = C(AB) = (CA)B=IB=B.) 
The matrix B is called the inverse of A and is denoted by A7!. 
Example 2 
The reader should verify that the inverse of 


po eae 
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In Section 3.2, we learn a technique for computing the inverse of a matrix. 
At this point, we develop a number of results that relate the inverses of 
matrices to the inverses of linear transformations. 


Lemma. Let T be an invertible linear transformation from V to W. Then 
V is finite-dimensional if and only if W is finite-dimensional. In this case, 
dim(V) = dim(W). 


Proof. Suppose that V is finite-dimensional. Let 6 = {1,22,...,¢%n}bea 
basis for V. By Theorem 2.2 (p. 68), T(G) spans R(T) = W; hence W is finite- 
dimensional by Theorem 1.9 (p. 44). Conversely, if W is finite-dimensional, 
then so is V by a similar argument, using T~+. 

Now suppose that V and W are finite-dimensional. Because T is one-to-one 
and onto, we have 


nullity(T) =0 and rank(T) = dim(R(T)) = dim(W). 
So by the dimension theorem (p. 70), it follows that dim(V) = dim(W). ff 


Theorem 2.18. Let V and W be finite-dimensional vector spaces with 
ordered bases 3 and y, respectively. Let T: V — W be linear. Then T is 
invertible if and only if [T]} is invertible. Furthermore, [T~"]4 = ([T]})~*. 


Proof. Suppose that T is invertible. By the lemma, we have dim(V) = 
dim(W). Let n = dim(V). So [T]} is an n x n matrix. Now T7': W > V 
satisfies TT~! = ly and T~!T = ly. Thus 


In = [WW] = [TT] p = [TE IT. 
-1 
Similarly, [T][T-1]2 = In. So [T]}, is invertible and ((713) = (T-18. 
Now suppose that A = [T]} is invertible. Then there exists an n x n 


matrix B such that AB = BA = I,,. By Theorem 2.6 (p. 72), there exists 
U € L(W,V) such that 


U(w;) = S > Biv; fOr PS, Qraccy 
i=1 


where y = {w1,W2,...,Wn} and B= {v1,v2,...,Un}. It follows that [U]? = 
B. To show that U = T~!, observe that 


[UT] ¢= [UIT], = BA =1, = [ve 


by Theorem 2.11 (p. 88). So UT = ly, and similarly, TU = lw. | 
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Example 3 


Let 6 and ¥ be the standard ordered bases of P;(R) and R?, respectively. For 
T as in Example 1, we have 


m= (5 ') and m= (| He 


It can be verified by matrix multiplication that each matrix is the inverse of 
the other. 


Corollary 1. Let V be a finite-dimensional vector space with an ordered 
basis 3, and let T: V — V be linear. Then T is invertible if and only if [T]g 


is invertible. Furthermore, [T~"]g = ([T]s)-- 
Proof. Exercise. i 


Corollary 2. Let A be ann x n matrix. Then A is invertible if and only 
if L, is invertible. Furthermore, (L4)~! = L4-1. 


Proof. Exercise. | 


The notion of invertibility may be used to formalize what may already 
have been observed by the reader, that is, that certain vector spaces strongly 
resemble one another except for the form of their vectors. For example, in 
the case of Mo,2(F) and F’, if we associate to each matrix 


the 4-tuple (a,b,c,d), we see that sums and scalar products associate in a 
similar manner; that is, in terms of the vector space structure, these two 
vector spaces may be considered identical or isomorphic. 


Definitions. Let V and W be vector spaces. We say that V is isomor- 
phic to W if there exists a linear transformation T: V — W that is invertible. 
Such a linear transformation is called an isomorphism from V onto W. 


We leave as an exercise (see Exercise 13) the proof that “is isomorphic 
to” is an equivalence relation. (See Appendix A.) So we need only say that 
V and W are isomorphic. 


Example 4 


Define T: F? — P;(F) by T(a1,a2) = a1 + aga. It is easily checked that T is 
an isomorphism; so F? is isomorphic to Pi(F). 
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Example 5 
Define 


_ (FQ) f(2) 

T: P3(R) > Mox2(R) by T(f) = & is) : 
It is easily verified that T is linear. By use of the Lagrange interpolation 
formula in Section 1.6, it can be shown (compare with Exercise 22) that 
T(f) = O only when f is the zero polynomial. Thus T is one-to-one (see 
Exercise 11). Moreover, because dim(P3(R)) = dim(M2x2(R)), it follows that 
T is invertible by Theorem 2.5 (p. 71). We conclude that P3(R) is isomorphic 
to Mox2(R). ¢ 


In each of Examples 4 and 5, the reader may have observed that isomor- 
phic vector spaces have equal dimensions. As the next theorem shows, this 
is no coincidence. 


Theorem 2.19. Let V and W be finite-dimensional vector spaces (over 
the same field). Then V is isomorphic to W if and only if dim(V) = dim(W). 


Proof. Suppose that V is isomorphic to W and that T: V — W is an 
isomorphism from V to W. By the lemma preceding Theorem 2.18, we have 
that dim(V) = dim(W). 

Now suppose that dim(V) = dim(W), and let 6 = {v1,v2,...,Un} and 
y = {w1,We,...,Wn} be bases for V and W, respectively. By Theorem 2.6 
(p. 72), there exists T: V — W such that T is linear and T(v;) = w; for 
i=1,2,...,n. Using Theorem 2.2 (p. 68), we have 


R(T) = span(T()) = span(7) = W. 


So T is onto. From Theorem 2.5 (p. 71), we have that T is also one-to-one. 
Hence T is an isomorphism. 


By the lemma to Theorem 2.18, if V and W are isomorphic, then either 
both of V and W are finite-dimensional or both are infinite-dimensional. 


Corollary. Let V be a vector space over F. Then V is isomorphic to F” 
if and only if dim(V) = n. 


Up to this point, we have associated linear transformations with their 
matrix representations. We are now in a position to prove that, as a vector 
space, the collection of all linear transformations between two given vector 
spaces may be identified with the appropriate vector space of mn matrices. 


Theorem 2.20. Let V and W be finite-dimensional vector spaces over F' 
of dimensions n and m, respectively, and let 3 and y be ordered bases for V 
and W, respectively. Then the function ®: L(V,W) > Mmxn(F), defined by 
®(T) = [T]} for T < L(V,W), is an isomorphism. 
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Proof. By Theorem 2.8 (p. 82), ® is linear. Hence we must show that ® 
is one-to-one and onto. This is accomplished if we show that for every m x n 
matrix A, there exists a unique linear transformation T: V — W such that 
O(T) = A. Let GB = {v1,v2,..., Un}, y = {wi, W2,---,Wm}, and let A bea 
given m x n matrix. By Theorem 2.6 (p. 72), there exists a unique linear 
transformation T: V — W such that 


T(v;) = So Aaj forl<j<n. 


i=1 
But this means that [T]} = A, or ®(T) = A. Thus © is an isomorphism. | 


Corollary. Let V and W be finite-dimensional vector spaces of dimensions 
n and m, respectively. Then £L(V,W) is finite-dimensional of dimension mn. 


Proof. The proof follows from Theorems 2.20 and 2.19 and the fact that 
dim(Mmxn(F)) = mn. i 


We conclude this section with a result that allows us to see more clearly 
the relationship between linear transformations defined on abstract finite- 
dimensional vector spaces and linear transformations from F” to F™. 

We begin by naming the transformation « — [z]g introduced in Sec- 
tion 2.2. 


Definition. Let 6 be an ordered basis for an n-dimensional vector space 
V over the field F. The standard representation of V with respect to 
GB is the function dg: V — F” defined by $g(x) = [a]g for each x € V. 


Example 6 


Let 6 = {(1,0),(0,1)} and y = {(1, 2), (3,4)}. It is easily observed that @ 
and ¥ are ordered bases for R?. For x = (1, —2), we have 


1 —5 
o(0)=lele=(_3) and oy(e)=[h=(3). ¢ 
We observed earlier that dg is a linear transformation. The next theorem 


tells us much more. 


Theorem 2.21. For any finite-dimensional vector space V with ordered 
basis 3, dg is an isomorphism. 


Proof. Exercise. | 


This theorem provides us with an alternate proof that an n-dimensional 
vector space is isomorphic to F” (see the corollary to Theorem 2.19). 
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| 
| (2) 
\ 
7) | | oy 
\ | 
Me Y 
~—-—--- > 
La 
Figure 2.2 


Let V and W be vector spaces of dimension n and m, respectively, and let 
T: V— W be a linear transformation. Define A = [T]}, where ( and ¥ are 
arbitrary ordered bases of V and W, respectively. We are now able to use og 
and , to study the relationship between the linear transformations T and 
L4:F" —F™. 

Let us first consider Figure 2.2. Notice that there are two composites of 
linear transformations that map V into F’: 


1. Map V into F” with ¢g and follow this transformation with L,; this 
yields the composite Lads. 
2. Map V into W with T and follow it by @, to obtain the composite @,T. 


These two composites are depicted by the dashed arrows in the diagram. 
By a simple reformulation of Theorem 2.14 (p. 91), we may conclude that 


Lads > py T; 


that is, the diagram “commutes.” Heuristically, this relationship indicates 
that after V and W are identified with F” and F™ via ¢g and @,, respectively, 
we may “identify” T with L4. This diagram allows us to transfer operations 
on abstract vector spaces to ones on F” and F™. 


Example 7 


Recall the linear transformation T: P3(2) — P2(R) defined in Example 4 of 
Section 2.2 (T(f(x)) = f’(x)). Let @ and y be the standard ordered bases for 
P3(R) and P2(R), respectively, and let dg: P3(R) > R* and $,: P2(R) > R? 
be the corresponding standard representations of P3(R) and P2(R). If A= 
[T];, then 


0 1 0 0 
A=|{0 0 2 0 
00 0 3 
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Consider the polynomial p(x) = 2+x%—32?+52°. We show that L4¢g(p(z)) = 
gy T(p(x)). Now 


0 1 0 0 j 
Lade(n(z))=(0 0 2 of | 5] =| -6 
0 0 0 8 5 


But since T(p(x)) = p’(x) = 1— 6x + 1527, we have 


1 


$y T(p(x)) = | —6 
15 


So Laga(p(x)) = o,T(p(2)).  @ 
Try repeating Example 7 with different polynomials p(x). 


EXERCISES 


1. Label the following statements as true or false. In each part, V and 
W are vector spaces with ordered (finite) bases a and {, respectively, 
T: V > Wis linear, and A and B are matrices. 


(a) (TIE) = [T-18. 
(b) T is invertible if and only if T is one-to-one and onto. 
(c) T=La, where A = [T]8. 

(d) Mo x3(F) is isomorphic to F°. 

(e) P,,(£) is isomorphic to P,,(£) if and only ifn = m. 
(f) AB=TI implies that A and B are invertible. 

(g) If A is invertible, then (A~t)~! = A. 

(h) A is invertible if and only if L,4 is invertible. 

(i) A must be square in order to possess an inverse. 


2. For each of the following linear transformations T, determine whether 
T is invertible and justify your answer. 


(a) T: R? — R? defined by T(a1, a2) = (a1 — 2a2, a2, 3a1 + 4a2). 
(b) T: R? — R® defined by T(a1, a2) = (3a1 — a2, a2, 401). 

(c) T: R® > R® defined by T(ay, a2, a3) = (3a, — 2a3, a2, 3a, + 4a2). 
(d) T: P3(R) > P2(R) defined by T(p(x)) = p’(x). 

(e) T 


> Mox2(R) — P2(R) defined by T (: : =a+2b«+(c+d)z?. 


d 


a bd a+b a 
(f) T: Mox2(R) > Mox2(R) defined by T (° > = ( a) 
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3. 


10.7 


11. 
12. 


13. 


14. 


Which of the following pairs of vector spaces are isomorphic? Justify 
your answers. 

(a) F® and P3(F). 

(b) F* and P3(F). 

(c) Mox2(R) and P3(R). 

(d) V={A © Moxo(R): tr(A) = 0} and R*. 

Let A and B be n x n invertible matrices. Prove that AB is invertible 
and (AB)-!=B-1A71. 


Let A be invertible. Prove that A‘ is invertible and (A‘)~! = (A7?)*. 
Prove that if A is invertible and AB = O, then B = O. 


Let A be an n X n matrix. 


(a) Suppose that A? = O. Prove that A is not invertible. 
(b) Suppose that AB = O for some nonzero n x n matrix B. Could A 
be invertible? Explain. 


Prove Corollaries 1 and 2 of Theorem 2.18. 


Let A and B be nxn matrices such that AB is invertible. Prove that A 
and B are invertible. Give an example to show that arbitrary matrices 
A and B need not be invertible if AB is invertible. 


Let A and B be n x n matrices such that AB = I,. 


(a) Use Exercise 9 to conclude that A and B are invertible. 

(b) Prove A = B~! (and hence B = A~'). (We are, in effect, saying 
that for square matrices, a “one-sided” inverse is a “two-sided” 
inverse. ) 

(c) State and prove analogous results for linear transformations de- 
fined on finite-dimensional vector spaces. 


Verify that the transformation in Example 5 is one-to-one. 
Prove Theorem 2.21. 


Let ~ mean “is isomorphic to.” Prove that ~ is an equivalence relation 
on the class of vector spaces over F’. 


v={(G “Taree rh. 
0 c 


Construct an isomorphism from V to F°. 


Let 
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15. 


16. 


Chap. 2 Linear Transformations and Matrices 


Let V and W be finite-dimensional vector spaces, and let T: V — W be 
a linear transformation. Suppose that @ is a basis for V. Prove that T 
is an isomorphism if and only if T(@) is a basis for W. 


Let B be an n X n invertible matrix. Define ®: Mayn(F’) > Mnxn(F) 
by ®(A) = B~'AB. Prove that ® is an isomorphism. 


17.1 Let V and W be finite-dimensional vector spaces and T: V > W be an 


18. 


19. 


isomorphism. Let Vo be a subspace of V. 


(a) Prove that T(Vo) is a subspace of W. 
(b) Prove that dim(Vo) = dim(T(Vo)). 


Repeat Example 7 with the polynomial p(x) = 14+ 2+ 2274 2°. 


In Example 5 of Section 2.1, the mapping T: Moy2(R) > Mex2(R) de- 
fined by T(M) = M* for each M € Moxo(R) is a linear transformation. 
Let 6 = {E", EY, E7!, E??\, which is a basis for Moy2(R), as noted in 
Example 3 of Section 1.6. 


(a) Compute [T],. 
(b) Verify that Lagg(M) = ¢gT(M) for A = [T]g and 


u-( 9) 


20.1 Let T: V > W bea linear transformation from an n-dimensional vector 


21. 


space V to an m-dimensional vector space W. Let @ and y be ordered 
bases for V and W, respectively. Prove that rank(T) = rank(L4) and 
that nullity(T) = nullity(L4), where A = [T]3. Hint: Apply Exercise 17 
to Figure 2.2. 


Let V and W be finite-dimensional vector spaces with ordered bases 
B= {v1,v2,...,Un} and y = {wi,we,...,Wm}, respectively. By The- 
orem 2.6 (p. 72), there exist linear transformations T;;: V — W such 
that 


Tig(vx) = : : : oe 

0 ifkf#g. 
First prove that {T,;: 1 <i<m,1< j <n} is a basis for L(V, W). 
Then let M” be the m x n matrix with 1 in the 7th row and jth column 
and 0 elsewhere, and prove that [T;;]} = M. Again by Theorem 2.6, 
there exists a linear transformation ®: £(V,W) > My,xn(£) such that 

©(T;;) = M*). Prove that ® is an isomorphism. 
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22. 


23. 


Let co,C1,---,Cn be distinct scalars from an infinite field F'. Define 
T: P,(F) — F"*+ by T(f) = (f(co), f(c1),--- ; f(Cn)). Prove that T is 
an isomorphism. Hint: Use the Lagrange polynomials associated with 
Co, €1,-+-+-5€n- 


Let V denote the vector space defined in Example 5 of Section 1.2, and 
let W = P(F). Define 


T:V—W by T(o) = So oli)’, 
i=0 


where n is the largest integer such that a(n) 4 0. Prove that T is an 
isomorphism. 


The following exercise requires familiarity with the concept of quotient space 
defined in Exercise 31 of Section 1.3 and with Exercise 40 of Section 2.1. 


24. 


25. 


Let T: V — Z be a linear transformation of a vector space V onto a 
vector space Z. Define the mapping 


T: V/N(T) = Z by T(v+N(T)) = T(v) 


for any coset v + N(T) in V/N(T). 


(a) Prove that T is well-defined; that is, prove that if v + N(T) = 
v' +N(T), then T(v) = T(v’). 


(b) Prove that T is linear. 


(c) Prove that T is an isomorphism. 


(d) Prove that the diagram shown in Figure 2.3 commutes; that is, 


prove that T = Tn. 


Figure 2.3 


Let V be a nonzero vector space over a field F’, and suppose that S is 
a basis for V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7, 
every vector space has a basis). Let C(S, F’) denote the vector space of 
all functions f € F(S,F) such that f(s) = 0 for all but a finite number 
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of vectors in S. (See Exercise 14 of Section 1.3.) Let U: C(S,F) > V 
be the function defined by 


wN= YL sls)s. 


s€S,f(s)40 


Prove that V is an isomorphism. Thus every nonzero vector space can 
be viewed as a space of functions. 


2.5 THE CHANGE OF COORDINATE MATRIX 


In many areas of mathematics, a change of variable is used to simplify the 
appearance of an expression. For example, in calculus an antiderivative of 
2re” can be found by making the change of variable u = x?. The resulting 
expression is of such a simple form that an antiderivative is easily recognized: 


frre asa ferdusetto=e +0 


Similarly, in geometry the change of variable 


2 / 1 / 
C= ur - 

een 

dy ty 


ae Js” 


can be used to transform the equation 2x? — 4ry + 5y? = 1 into the simpler 
equation (x’)?+6(y’)? = 1, in which form it is easily seen to be the equation of 
an ellipse. (See Figure 2.4.) We see how this change of variable is determined 
in Section 6.5. Geometrically, the change of variable 


is a change in the way that the position of a point P in the plane is described. 
This is done by introducing a new frame of reference, an x’y'-coordinate 
system with coordinate axes rotated from the original xy-coordinate axes. In 
this case, the new coordinate axes are chosen to lie in the direction of the 
axes of the ellipse. The unit vectors along the z-axis and the y’-axis form an 


ordered basis 
otal): ve a)} 


for R?, and the change of variable is actually a change from [P]g = @ the 


coordinate vector of P relative to the standard ordered basis 3 = {e1, e2}, to 
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/ 
[Pla = (*,) , the coordinate vector of P relative to the new rotated basis (’. 


Figure 2.4 


A natural question arises: How can a coordinate vector relative to one ba- 
sis be changed into a coordinate vector relative to the other? Notice that the 
system of equations relating the new and old coordinates can be represented 


by the matrix equation 
1 VS ee! 
yp VB\L Ny)? 


equals Wee where | denotes the identity transformation on R*. Thus [v]g = 
Q[v|, for all v € R®. A similar result is true in general. 
Theorem 2.22. Let 3 and (' be two ordered bases for a finite-dimensional 
vector space V, and let Q = [lv}3r- Then 
(a) Q is invertible. 
(b) For any v € V, [v]g = Q[var. 


Proof. (a) Since ly is invertible, Q is invertible by Theorem 2.18 (p. 101). 
(b) For any v EV, 


[ols = [Wo)]a = [Wglelar = Qlelor 


by Theorem 2.14 (p. 91). | 
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The matrix Q = [lv1%, defined in Theorem 2.22 is called a change of coor- 
dinate matrix. Because of part (b) of the theorem, we say that Q changes 
@'-coordinates into 3-coordinates. Observe that if @ = {x1,22,...,¢n} 
and 3’ = {a},25,...,2/,}, then 


n 

/ 

v5 = S QijXi 
i=1 


for j = 1,2,... ,n; that is, the jth column of Q is [zij],. 
Notice that if Q changes (’-coordinates into @-coordinates, then Qu! 
changes (-coordinates into 3’-coordinates. (See Exercise 11.) 


Example 1 
In R?, let @ = {(1,1), (1,-1)} and @’ = {(2, 4), (3,1)}. Since 
(2,4) = 3(1,1) —1(1,-1) and (3,1) = 2(1,1) +101, -1), 


the matrix that changes (’-coordinates into 3-coordinates is 


3 2 
ora) 
Thus, for instance, 


(2Me= 12H) =@(4)=( 3). ¢ 


For the remainder of this section, we consider only linear transformations 
that map a vector space V into itself. Such a linear transformation is called a 
linear operator on V. Suppose now that T is a linear operator on a finite- 
dimensional vector space V and that 3 and (@’ are ordered bases for V. Then 
V can be represented by the matrices [T]g and [T]g’. What is the relationship 
between these matrices? The next theorem provides a simple answer using a 
change of coordinate matrix. 


Theorem 2.23. Let T be a linear operator on a finite-dimensional vector 
space V, and let 3 and (’ be ordered bases for V. Suppose that Q is the 
change of coordinate matrix that changes 3’-coordinates into (-coordinates. 
Then 


[Te = Q-*[T]2. 


Proof. Let | be the identity transformation on V. Then T = IT = TI; 
hence, by Theorem 2.11 (p. 88), 


QiTa = WS(TIG = (TIS = (TNS, = (TIBI. = (Tle. 
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Example 2 
Let T be the linear operator on R? defined by 


*()=(Cat) 


and let @ and @’ be the ordered bases in Example 1. The reader should verify 


that 
Me=(3 4): 


In Example 1, we saw that the change of coordinate matrix that changes 
3'-coordinates into G-coordinates is 


3.2 


To show that this is the correct matrix, we can verify that the image 
under T of each vector of (’ is the linear combination of the vectors of (3 
with the entries of the corresponding column as its coefficients. For example, 
the image of the second vector in (’ is 


mG) =e) =") 20): 


Notice that the coefficients of the linear combination are the entries of the 
second column of [T]g. 


It is often useful to apply Theorem 2.23 to compute [T]g, as the next 
example shows. 


Example 3 


Recall the reflection about the x-axis in Example 3 of Section 2.1. The rule 
(x,y) — (#,—y) is easy to obtain. We now derive the less obvious rule for 
the reflection T about the line y = 2x. (See Figure 2.5.) We wish to find an 
expression for T(a,b) for any (a,b) in R?. Since T is linear, it is completely 
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Figure 2.5 


determined by its values on a basis for R?. Clearly, T(1,2) = (1,2) and 
T(—2,1) = —(—2,1) = (2,-1). Therefore if we let 


v={G) (a) 


then 9’ is an ordered basis for R? and 


[Te = € a 


Let 3 be the standard ordered basis for R?, and let Q be the matrix that 
changes 3’-coordinates into 3-coordinates. Then 


@=(. 7) 


and Q~'[T]sQ = [T]a. We can solve this equation for [T]g to obtain that 
[T]¢ = Q[T]eQ-*. Because 


= 1 1 2 
Qt=5 (5 i). 


Taree) 


Since ( is the standard ordered basis, it follows that T is left-multiplication 
by [T]g. Thus for any (a, b) in R*, we have 


“(22 JQ)-3Ct8). 


the reader can verify that 
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A useful special case of Theorem 2.23 is contained in the next corollary, 
whose proof is left as an exercise. 


Corollary. Let A € Mnxn(F), and let y be an ordered basis for F”. Then 
[La], = Q7!AQ, where Q is the n x n matrix whose jth column is the jth 
vector of ¥y. 


Example 4 
Let 
2 1 0 
A=]{1 3], 
0 -l1 O 
and let 
—1 2 1 
ee OF,{1],u2]¢, 
0 0 1 


which is an ordered basis for R?. Let Q be the 3 x 3 matrix whose jth column 
is the jth vector of y. Then 


FD Po 
Q={011 and @Qiti={ 041 -1 
O O° 4 0 0 


So by the preceding corollary, 


p 2) 8 
[Laly=Q74Q=[-1 4 6]. 
Ol. 


The relationship between the matrices [T] and [T]g in Theorem 2.23 will 
be the subject of further study in Chapters 5, 6, and 7. At this time, however, 
we introduce the name for this relationship. 


Definition. Let A and B be matrices in Mnyn(F). We say that B is 
similar to A if there exists an invertible matrix Q such that B = Q7! AQ. 


Observe that the relation of similarity is an equivalence relation (see Ex- 
ercise 9). So we need only say that A and B are similar. 

Notice also that in this terminology Theorem 2.23 can be stated as follows: 
If T is a linear operator on a finite-dimensional vector space V, and if 3 and 
@’ are any ordered bases for V, then [T],: is similar to [T],. 

Theorem 2.23 can be generalized to allow T: V — W, where V is distinct 
from W. In this case, we can change bases in V as well as in W (see Exercise 8). 
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EXERCISES 


Label the following statements as true or false. 


(a) Suppose that 6 = {x1,22,...,2@,} and 6’ = {a),24,...,a/,} are 
ordered bases for a vector space and Q is the change of coordinate 
matrix that changes (’-coordinates into 3-coordinates. Then the 
jth column of Q is [xj]. 

(b) Every change of coordinate matrix is invertible. 

(c) Let T be a linear operator on a finite-dimensional vector space V, 
let @ and @’ be ordered bases for V, and let Q be the change of 
coordinate matrix that changes (’-coordinates into 3-coordinates. 
Then [T]e — Q[T]¢Q7". 

(d) The matrices A, BE Myxn(F) are called similar if B = Q' AQ for 
some Q € Maxn(F). 

(e) Let T be a linear operator on a finite-dimensional vector space V. 
Then for any ordered bases @ and ¥ for V, [T], is similar to [T],. 


For each of the following pairs of ordered bases 3 and (’ for R?, find 
the change of coordinate matrix that changes (’-coordinates into (- 
coordinates. 


(a) @= {e1,e2} and 9" = {(a1, a2), (b1, b2)} 
(b) p= i593), (2, =I} and (3 = {(0, 10), (5, 0)} 
(c) p= {(2, 5), (=1, —3)} and (3 = 1014 C55 
(d) p= {(—4,3), (2, cae ON, and (3 = {(2, 1); (—4, 1)} 


For each of the following pairs of ordered bases 3 and (’ for Po(R), 
find the change of coordinate matrix that changes (’-coordinates into 
(G-coordinates. 


(a) B={2?,z,1} and 
GB! = {agx? + a,x + ao, box? + bx + bo, cou? + C12 + co} 
(b) @6={1,z,x7} and 
GB! = fagx? + a,x + ao, box? + bx + bo, cou” + C12 + co} 
(c) 6 = {2x7 — 2,327 +1, 27} and @’ = {1,2, x7} 
(d) B={ax?-2+4+1,24+1,27+1} and 
3’ = {x2 +24+4,4a? — 3x + 2,227 + 3} 
(e) B={x? —2,27+1,2—1} and 
3! = {5x2 — 22 — 3,22? + Bar + 5, 22? — x — 3} 
(f) @={2e? —2+1,27 + 32 —2,-2? + 22+ 1} and 


3! = {9x —9, x7 +212 — 2,327 + 52 + 2} 


Let T be the linear operator on R? defined by 


r()- C8) 
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let @ be the standard ordered basis for R?, and let 


#={()-()}- 


Use Theorem 2.23 and the fact that 
boty of Bo Sk 
1 2 he 1 


5. Let T be the linear operator on Pi(R) defined by T(p(x)) = p'(x), 
the derivative of p(#). Let 6 = {1,2} and #’ = {1+ 2,1-— <2}. Use 
Theorem 2.23 and the fact that 


to find [T]g. 


—N 
ao 
| 
ao 
Se, 
| 
ra 
II 
NIP Nle 
| 
NIP le 


to find [T] : 


6. For each matrix A and ordered basis {, find [L4]g. Also, find an invert- 
ible matrix Q such that [La]g = Q-'AQ. 


@ 4=() 1) a e={(z)-()} 


Re 


<> i 1 
a= (i) mm = 1G) (a)} 

i. Be 1 1 1 
(c) A={2 0 1) and B=2{1],[o],]1 

11 0 1 2 


1 

i. tA i 1\ /1 

(d) A={ 1 13 4] and B= Off ee [ed 
= 


4 4 10 0/7 \1 


7. In R?, let L be the line y = mz, where m 4 0. Find an expression for 
T(z, y), where 


a) T is the reflection of R? about L. 
(a) 


(b) T is the projection on L along the line perpendicular to L. (See 
the definition of projection in the exercises of Section 2.1.) 


8. Prove the following generalization of Theorem 2.23. Let T: V — W be 
a linear transformation from a finite-dimensional vector space V to a 
finite-dimensional vector space W. Let @ and (’ be ordered bases for 
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10. 


11. 


12. 


13.7 


14. 
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V, and let y and 7’ be ordered bases for W. Then Rae = P“"[T]3Q, 
where Q is the matrix that changes (’-coordinates into 6-coordinates 
and P is the matrix that changes 7’-coordinates into y-coordinates. 


Prove that “is similar to” is an equivalence relation on Mnxn(F). 


Prove that if A and B are similar n x n matrices, then tr(A) = tr(B). 
Hint: Use Exercise 13 of Section 2.3. 


Let V be a finite-dimensional vector space with ordered bases a, 73, 
and ¥. 


(a) Prove that if Q and R are the change of coordinate matrices that 
change a-coordinates into (@-coordinates and (-coordinates into 
y-coordinates, respectively, then RQ is the change of coordinate 
matrix that changes a-coordinates into y-coordinates. 

(b) Prove that if Q changes a-coordinates into (3-coordinates, then 
Q-1 changes 3-coordinates into a-coordinates. 


Prove the corollary to Theorem 2.23. 


Let V be a finite-dimensional vector space over a field F’, and let @ = 
{x1,£2,...,2n} be an ordered basis for V. Let Q be an n x n invertible 
matrix with entries from F’. Define 


n 
x, = pore forl<j<n, 
i=l 


and set 3’ = {#},x5,...,x1,}. Prove that 6’ is a basis for V and hence 
that Q is the change of coordinate matrix changing /’-coordinates into 
(G-coordinates. 


Prove the converse of Exercise 8: If A and B are each m x n matrices 
with entries from a field F’,, and if there exist invertible m xm andnxn 
matrices P and Q, respectively, such that B = P~! AQ, then there exist 
an n-dimensional vector space V and an m-dimensional vector space W 
(both over F’), ordered bases 3 and (3’ for V and y and 7 for W, and a 
linear transformation T: V — W such that 


A=(T]} and B=[T]y. 


Hints: Let V = F", W = F™, T = Lg, and £@ and ¥ be the standard 
ordered bases for F” and F™, respectively. Now apply the results of 
Exercise 13 to obtain ordered bases (’ and 7 from 3 and ¥y via Q and 
P, respectively. 
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2.6* DUAL SPACES 


In this section, we are concerned exclusively with linear transformations from 
a vector space V into its field of scalars F’, which is itself a vector space of di- 
mension 1 over F’. Such a linear transformation is called a linear functional 
on V. We generally use the letters f, g,h,... to denote linear functionals. As 
we see in Example 1, the definite integral provides us with one of the most 
important examples of a linear functional in mathematics. 


Example 1 


Let V be the vector space of continuous real-valued functions on the interval 
(0, 27]. Fix a function g € V. The function h: V > R defined by 


1 20 
h == t)g(t) dt 
@)=5f eal 
is a linear functional on V. In the cases that g(t) equals sin nt or cos nt, h(x) 
is often called the nth Fourier coefficient of 7. 
Example 2 


Let V = Myxn(F), and define f: V — F' by f(A) = tr(A), the trace of A. By 
Exercise 6 of Section 1.3, we have that f is a linear functional. 


Example 3 
Let V be a finite-dimensional vector space, and let 3 = {21,%2,...,%n} be 
an ordered basis for V. For each i = 1,2,...,n, define f;(a) = a;, where 
ay 
a2 
[z]a=] . 
an 


is the coordinate vector of x relative to 3. Then f; is a linear functional on V 
called the th coordinate function with respect to the basis 3. Note 
that fi(a;) = 6;;, where 6,; is the Kronecker delta. These linear functionals 
play an important role in the theory of dual spaces (see Theorem 2.24). 


Definition. For a vector space V over F’, we define the dual space of 
V to be the vector space L(V, F’), denoted by V*. 


Thus V* is the vector space consisting of all linear functionals on V with 
the operations of addition and scalar multiplication as defined in Section 2.2. 
Note that if V is finite-dimensional, then by the corollary to Theorem 2.20 
(p. 104) 


dim(V*) = dim(L(V, F)) = dim(V)- dim(F’) = dim(V). 
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Hence by Theorem 2.19 (p. 103), V and V* are isomorphic. We also define 
the double dual V** of V to be the dual of V*. In Theorem 2.26, we show, 
in fact, that there is a natural identification of V and V** in the case that V 
is finite-dimensional. 


Theorem 2.24. Suppose that V is a finite-dimensional vector space with 
the ordered basis 3 = {%1,%2,...,Un}. Let f; (1 <i <n) be the ith coordi- 
nate function with respect to 6 as just defined, and let 3* = {f),fo,..., fr}. 
Then (* is an ordered basis for V*, and, for any f € V*, we have 


n 


f= > f@ote 


i=1 
Proof. Let f € V*. Since dim(V*) = n, we need only show that 


n 


f= Steph, 


i=l 


from which it follows that 3* generates V*, and hence is a basis by Corollary 
2(a) to the replacement theorem (p. 47). Let 


i=1 


For 1 < j <n, we have 


g(xj) = (>: rat (xj) = Dd F(ai)fi(as) 


=i f@eg =H): 


i=1 
Therefore f = g by the corollary to Theorem 2.6 (p. 72). | 
Definition. Using the notation of Theorem 2.24, we call the ordered 
basis 6* = {f,,fo,...,f,} of V* that satisfies f;(2;) = 6,; (1 < 1,7 <n) the 
dual basis of (3. 
Example 4 


Let @ = {(2,1),(3,1)} be an ordered basis for R?. Suppose that the dual 
basis of @ is given by 3* = {f1,f2}. To explicitly determine a formula for f,, 
we need to consider the equations 


1 = f,(2, 1) => fi (2e1 ae €2) => 2f1(e1) ic fi (e2) 
0 = f1(3, 1) = fi (3e1 a ied €2) = 3f1(e1) as oe fi (e2). 


Solving these equations, we obtain f;(e,) = —1 and f\(e2) = 3; that. is, 
fi(x,y) = —x + 3y. Similarly, it can be shown that fo(z,y)=a2—2y. 
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We now assume that V and W are finite-dimensional vector spaces over F 
with ordered bases (3 and 7¥, respectively. In Section 2.4, we proved that there 
is a one-to-one correspondence between linear transformations T: V — W and 
m x n matrices (over F’) via the correspondence T @ [T]3- For a matrix of 
the form A = [T]}, the question arises as to whether or not there exists a 
linear transformation U associated with T in some natural way such that U 
may be represented in some basis as A’. Of course, if m 4 n, it would be 
impossible for U to be a linear transformation from V into W. We now answer 
this question by applying what we have already learned about dual spaces. 


Theorem 2.25. Let V and W be finite-dimensional vector spaces over 
F with ordered bases 3 and y, respectively. For any linear transformation 
T: VW, the mapping T': W* — V* defined by T'(g) = gT for all g € W* 
is a linear transformation with the property that ie = ((T]3)’. 


Proof. For g € W*, it is clear that T’(g) = gT is a linear functional on V 
and hence is in V*. Thus T’ maps W* into V*. We leave the proof that T° is 
linear to the reader. 

To complete the proof, let G = {21,22,...,¢n} and y = {y1,y2,---,Ym} 
with dual bases 6* = {f1,fo,...,fn} and 7* = {g1,g2,---, 8m}, respectively. 
For convenience, let A = [T]}. To find the jth column of (Tee; we be- 
gin by expressing T'(g;) as a linear combination of the vectors of 6*. By 
Theorem 2.24, we have 


T'(8)) = BT = D(esT)(es)fs. 
s=l1 
So the row 7, column j entry of pan is 
(ej T) (21) = g;(T(ai)) = By (>. sun) 


= SE Api&j (Yk) = Ss Anidjk = Aji- 
k=1 


> 
Il 
un 


Hence Baa = A’, | 


The linear transformation T’ defined in Theorem 2.25 is called the trans- 


pose of T. It is clear that Té is the unique linear transformation U such that 
[uls» = (T]3)*- 
We illustrate Theorem 2.25 with the next example. 
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Example 5 


Define T: P1(R) — R? by T(p(x)) = (p(0), p(2)). Let 3 and ¥ be the standard 
ordered bases for P;(R) and R?, respectively. Clearly, 


m= (t 2): 


We compute [ree directly from the definition. Let 6* = {f,,f2} and y* = 


{gi,g2}. Suppose that Ree = ¢ A Then T'(gi) = afi + cf. So 


T'(g1)(1) = (afi + cfe)(1) = afi (1) + efo(1) = a(1) + c(0) =a. 
But also 


(T"(g1))(1) = e1(T(1)) = 212, 1) = 1. 


So a= 1. Using similar computations, we obtain that c = 0, 6 = 1, and 
d= 2. Hence a direct computation yields 


bige AN ‘ 
ree = ¢ >) =(IM}) 
as predicted by Theorem 2.25. 


We now concern ourselves with demonstrating that any finite-dimensional 
vector space V can be identified in a natural way with its double dual V**. 
There is, in fact, an isomorphism between V and V** that does not depend 
on any choice of bases for the two vector spaces. 

For a vector x € V, we define 7: V* — F by Z(f) = f(x) for every f € V*. 
It is easy to verify that % is a linear functional on V*, so Z € V**. The 
correspondence x <> % allows us to define the desired isomorphism between 
V and V**. 


Lemma. Let V be a finite-dimensional vector space, and let x € V. If 
x(f) =0 for allf € V*, then x = 0. 


Proof. Let x # 0. We show that there exists f € V* such that Z(f) 4 0. 
Choose an ordered basis 6 = {21,2%2,...,%n} for V such that 2; = x. Let 
{fi, f2,..., fn} be the dual basis of 3. Then fi(21) = 140. Let f = fy. | 


Theorem 2.26. Let V be a finite-dimensional vector space, and define 
w:V—>V™ by U(x) = @. Then w is an isomorphism. 
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Proof. (a) w is linear: Let x,y € V and c € F. For f € V*, we have 


p(cx + y)(f) = flew + y) = cf (x) + Fly) = c&(F) + U(f) 
=(@ + 4)(f)- 


Therefore 


Wcat+y) = + Y= cY(x) + Wy). 


(b) & is one-to-one: Suppose that (a) is the zero functional on V* for 
some « € V. Then x(f) = 0 for every f € V*. By the previous lemma, we 
conclude that «= 0. 

(c) ~ is an isomorphism: This follows from (b) and the fact that dim(V) = 
dim(V**). | 


Corollary. Let V be a finite-dimensional vector space with dual space V* . 
Then every ordered basis for V* is the dual basis for some basis for V. 


Proof. Let {f1,f2,...,f,} be an ordered basis for V*. We may combine 
Theorems 2.24 and 2.26 to conclude that for this basis for V* there exists a 
dual basis {%1, %2,...,%,} in V**, that is, 6,; = Z,(f;) = f(a.) for all ¢ and 
j. Thus {f1, fo,...,f,} is the dual basis of {2 ,72,...,2n}. | 


Although many of the ideas of this section, (e.g., the existence of a dual 
space), can be extended to the case where V is not finite-dimensional, only a 
finite-dimensional vector space is isomorphic to its double dual via the map 
x — %. In fact, for infinite-dimensional vector spaces, no two of V, V*, and 
V** are isomorphic. 


EXERCISES 


1. Label the following statements as true or false. Assume that all vector 
spaces are finite-dimensional. 


(a) Every linear transformation is a linear functional. 

(b) A linear functional defined on a field may be represented as a1 x 1 
matrix. 

(c) Every vector space is isomorphic to its dual space. 

(d) Every vector space is the dual of some other vector space. 

(e) If T is an isomorphism from V onto V* and @ is a finite ordered 
basis for V, then T() = £*. 

(f) If T is a linear transformation from V to W, then the domain of 
(Tis Ve". 

(g) If V is isomorphic to W, then V* is isomorphic to W*. 
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(h) The derivative of a function may be considered as a linear func- 
tional on the vector space of differentiable functions. 


For the following functions f on a vector space V, determine which are 
linear functionals. 


(a) V=P(R); f(p(x)) = 2p'(0) + p’(1), where ’ denotes differentiation 
(b) V=R?; f(z, y) = (22, - 

(c) V= Maxco(F ) CA) te) 

(d) VSR: Ee we a 

(e) V=P(R) fo 


(p(2)) = 
() VoMas(P fA) = 


For each of the following vector spaces V and bases (, find explicit 
formulas for vectors of the dual basis G* for V*, as in Example 4. 
(a) V=R*; 6 = {(1,0,1), (1,2, 1), (0,0, 1)} 


Let V = R?, and define f,, fo, fs € V* as follows: 
fi(a,y,z) = x@ — 2y, fo(z,y,2) =a2+yt+z2, f3(z,y,z) =y — 3z. 


Prove that {f, fo, f3} is a basis for V*, and then find a basis for V for 
which it is the dual basis. 


Let V = P1(R), and, for p(a) € V, define f1, fo € V* by 


fulo(a)) = | neat and fa(plz)) = [pleat 


Prove that {f1, fo} is a basis for V*, and find a basis for V for which it 
is the dual basis. 


Define f € (R?)* by f(x,y) = 22 + y and T: R? — R? by T(z,y) = 

(3a + 2y, x). 

(a) Compute T‘(f). 

(b) Compute [T'],-, where ( is the standard ordered basis for R? and 
B* = {f1, fo} is the dual basis, by finding scalars a, b,c, and d such 
that T! (f1) = af, + cf and T*(f2) = bf; + dfy. 

(c) Compute [T]g and ([T]g)’, and compare your results with (b). 


Let V = P;(R) and W = R? with respective standard ordered bases 3 
and y. Define T: V — W by 


T(p(x)) = (p(0) — 2p(1), p(0) + p‘(0)), 


where p’(a) is the derivative of p(x). 
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10. 


(a) For f € W* defined by f(a,b) = a — 2b, compute T‘(f). 
(b) Compute [els without appealing to Theorem 2.25. 


(c) Compute [T]} and its transpose, and compare your results with 
(b). 


Show that every plane through the origin in R? may be identified with 
the null space of a vector in (R°)*. State an analogous result for R?. 


Prove that a function T: F” — F™ is linear if and only if there exist 
fi, fo,...,fm © (F”)* such that T(x) = (f1 (2), fo(x),...,fm(x)) for all 
x €F”. Hint: If T is linear, define f;(x) = (g;T)() for x € F”; that is, 
f; = T'(g;) for 1 < i < m, where {g,g2,...,8m} is the dual basis of 
the standard ordered basis for F”. 


Let V = P,,(F), and let co,c1,.-.,Cn be distinct scalars in F. 

(a) For 0 <i <n, define f; € V* by fi(p(x)) = p(c). Prove that 
{fo, fi,.-., fr} is a basis for V*. Hint: Apply any linear combi- 
nation of this set that equals the zero transformation to p(x) = 
(a — c1)(@ — cg) +++ (a@— ey), and deduce that the first coefficient is 
ZeErO. 

(b) Use the corollary to Theorem 2.26 and (a) to show that there exist 
unique polynomials po(x),pi(x),...,Pn(x) such that p;(c;) = 4:; 
for 0 <i<n. These polynomials are the Lagrange polynomials 
defined in Section 1.6. 

(c) For any scalars ao, a1,...,@n (not necessarily distinct), deduce that 
there exists a unique polynomial q() of degree at most n such that 
q(c;) = a; for 0 <i<n. In fact, 


q(z) = S- aipi(2). 
i=0 
(d) Deduce the Lagrange interpolation formula: 
p(x) = S~ p(ci)pi(a) 
i=0 


for any p(x) EV. 
(e) Prove that 


where 
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Suppose now that 


i(b— a) 


cg = a+ —— fori=0,1,...,n. 
n 


For n = 1, the preceding result yields the trapezoidal rule for 
evaluating the definite integral of a polynomial. For n = 2, this 
result yields Simpson’s rule for evaluating the definite integral of 
a polynomial. 


11. Let V and W be finite-dimensional vector spaces over F’, and let ~, and 
we be the isomorphisms between V and V** and W and W**, respec- 
tively, as defined in Theorem 2.26. Let T: V — W be linear, and define 
T" = (T‘)'. Prove that the diagram depicted in Figure 2.6 commutes 
(i.e., prove that oT = T41). 


i ese OW 


»| [> 


2k i 2K 
Ves ——> W 


Figure 2.6 


12. Let V be a finite-dimensional vector space with the ordered basis (3. 
Prove that ~() = 6**, where w is defined in Theorem 2.26. 


In Exercises 13 through 17, V denotes a finite-dimensional vector space over 
F. For every subset 5S’ of V, define the annihilator 5° of S as 


S° = {f € V*: f(z) =0 for all x € S}. 


13. (a) Prove that S° is a subspace of V*. 

(b) If W is asubspace of V and x ¢ W, prove that there exists f € W° 
such that f(a) 4 0. 

(c) Prove that (S°)° = span(2($)), where w is defined as in Theo- 
rem 2.26. 

(d) For subspaces W, and W2, prove that W, = W if and only if 
Ww? = W9. 

i 2 
(e) For subspaces W; and We, show that (W; + We)? = WP W9. 


14. Prove that if W is a subspace of V, then dim(W) + dim(W°) = dim(V). 
Hint: Extend an ordered basis {x1,22,...,2,%} of W to an ordered ba- 
sis B = {#1,%2,...,Un} of V. Let 6* = {f1,fo,...,fn}. Prove that 
{frat, frta,-+-5fn} is a basis for W°. 
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15. Suppose that W is a finite-dimensional vector space and that T: V — W 
is linear. Prove that N(T‘) = (R(T))°. 


16. Use Exercises 14 and 15 to deduce that rank(L4+) = rank(L,) for any 
AE Mmxn(F). 


17. Let T be a linear operator on V, and let W be a subspace of V. Prove 
that W is T-invariant (as defined in the exercises of Section 2.1) if and 
only if W° is T'-invariant. 


18. Let V be a nonzero vector space over a field F’, and let S be a basis 
for V. (By the corollary to Theorem 1.13 (p. 60) in Section 1.7, every 
vector space has a basis.) Let ®: V* — L(S, F’) be the mapping defined 
by ®(f) = fg, the restriction of f to S. Prove that ® is an isomorphism. 
Hint: Apply Exercise 34 of Section 2.1. 


19. Let V be a nonzero vector space, and let W be a proper subspace of V 
(i.e., W#V). Prove that there exists a nonzero linear functional f € V* 
such that f(a) = 0 for all « © W. Hint: For the infinite-dimensional 
case, use Exercise 34 of Section 2.1 as well as results about extending 
linearly independent sets to bases in Section 1.7. 


20. Let V and W be nonzero vector spaces over the same field, and let 
T: V—W be a linear transformation. 


(a) Prove that T is onto if and only if T’ is one-to-one. 
(b) Prove that T° is onto if and only if T is one-to-one. 


Hint: Parts of the proof require the result of Exercise 19 for the infinite- 
dimensional case. 


2.7* HOMOGENEOUS LINEAR DIFFERENTIAL EQUATIONS 
WITH CONSTANT COEFFICIENTS 


As an introduction to this section, consider the following physical problem. A 
weight of mass m is attached to a vertically suspended spring that is allowed to 
stretch until the forces acting on the weight are in equilibrium. Suppose that 
the weight is now motionless and impose an xy-coordinate system with the 
weight at the origin and the spring lying on the positive y-axis (see Figure 2.7). 


Suppose that at a certain time, say t = 0, the weight is lowered a distance 
s along the y-axis and released. The spring then begins to oscillate. 

We describe the motion of the spring. At any time t > 0, let F(t) denote 
the force acting on the weight and y(t) denote the position of the weight along 
the y-axis. For example, y(0) = —s. The second derivative of y with respect 
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Figure 2.7 


to time, y(t), is the acceleration of the weight at time t; hence, by Newton’s 
second law of motion, 


F(t) = my"(t). (1) 


It is reasonable to assume that the force acting on the weight is due totally 
to the tension of the spring, and that this force satisfies Hooke’s law: The force 
acting on the weight is proportional to its displacement from the equilibrium 
position, but acts in the opposite direction. If k > 0 is the proportionality 
constant, then Hooke’s law states that 


F(t) =—ky(t). (2) 


Combining (1) and (2), we obtain my” = —ky or 


k 
y+ —y=0. (3) 
m 


The expression (3) is an example of a differential equation. A differential 
equation in an unknown function y = y(t) is an equation involving y,t, and 
derivatives of y. If the differential equation is of the form 


any a Gig» Bega ayy + agy = fi; (4) 


where ag,a1,...,@, and f are functions of t and y*) denotes the kth deriva- 
tive of y, then the equation is said to be linear. The functions a, are called 
the coefficients of the differential equation (4). Thus (3) is an example 
of a linear differential equation in which the coefficients are constants and 
the function f is identically zero. When f is identically zero, (4) is called 
homogeneous. 

In this section, we apply the linear algebra we have studied to solve ho- 
mogeneous linear differential equations with constant coefficients. If an 4 0, 
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we say that differential equation (4) is of order n. In this case, we divide 
both sides by ay, to obtain a new, but equivalent, equation 


y™ + dary? —Y +--+ + bry + boy = 0, 


where b; = a;/dy for 1 = 0,1,...,n. —1. Because of this observation, we 
always assume that the coefficient a,, in (4) is 1. 

A solution to (4) is a function that when substituted for y reduces (4) 
to an identity. 
Example 1 
The function y(t) = sin \/k/mt is a solution to (3) since 


k k k k k 
y(t) + —y(t) = sin 4/—t+—sin4/—t=0 
m m mm m 


for all t. Notice, however, that substituting y(t) = t into (3) yields 


k 


k 
y"(t) + —y(t) = —t, 
m m 


which is not identically zero. Thus y(t) = ¢ is not a solution to (3).  @ 


In our study of differential equations, it is useful to regard solutions as 
complex-valued functions of a real variable even though the solutions that 
are meaningful to us in a physical sense are real-valued. The convenience 
of this viewpoint will become clear later. Thus we are concerned with the 
vector space F(R,C) (as defined in Example 3 of Section 1.2). In order to 
consider complex-valued functions of a real variable as solutions to differential 
equations, we must define what it means to differentiate such functions. Given 
a complex-valued function « € F(R, C) of a real variable t, there exist unique 
real-valued functions x; and x2 of t, such that 


a(t) =a1(t)+ixo(t) for te R, 


where i is the imaginary number such that 7? = —1. We call 1 the real part 
and x2 the imaginary part of x. 


Definitions. Given a function x € F(R,C) with real part x, and imag- 
inary part x2, we say that x is differentiable if x, and x2 are differentiable. 
If x is differentiable, we define the derivative x' of x by 


1 ah - or 
v= 2X, 4+ 1X5. 


We illustrate some computations with complex-valued functions in the 
following example. 
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Example 2 
Suppose that x(t) = cos 2t + isin 2t. Then 


x’ (t) = —2sin 2t + 2i cos 2t. 


We next find the real and imaginary parts of x”. Since 
x*(t) = (cos 2t + isin 2t)? = (cos? 2¢ — sin? 2t) + i(2 sin 2t cos 2t) 
= cos 4t + isin 4t, 


the real part of x(t) is cos4t, and the imaginary part is sin4dt. 


The next theorem indicates that we may limit our investigations to a 
vector space considerably smaller than F(R, C). Its proof, which is illustrated 
in Example 3, involves a simple induction argument, which we omit. 


Theorem 2.27. Any solution to a homogeneous linear differential equa- 
tion with constant coefficients has derivatives of all orders; that is, if x is a 
solution to such an equation, then x") exists for every positive integer k. 


Example 3 
To illustrate Theorem 2.27, consider the equation 
y + dy = 0. 


Clearly, to qualify as a solution, a function y must have two derivatives. If y 
is a solution, however, then 


y?) = —Ay. 


Thus since y?) is a constant multiple of a function y that has two derivatives, 
y) must have two derivatives. Hence y“ exists; in fact, 


yO = Ay, 


Since y is a constant multiple of a function that we have shown has at 
least two derivatives, it also has at least two derivatives; hence y‘ exists. 
Continuing in this manner, we can show that any solution has derivatives of 
all orders. 


Definition. We use C®© to denote the set of all functions in F(R,C) that 
have derivatives of all orders. 


It is a simple exercise to show that C® is a subspace of F(R, C) and hence 
a vector space over C. In view of Theorem 2.27, it is this vector space that 
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is of interest to us. For 2 € C®, the derivative x’ of x also lies in C°. We 
can use the derivative operation to define a mapping D: C®° — C® by 


Digrsa" forge:c™. 


It is easy to show that D is a linear operator. More generally, consider any 
polynomial over C’ of the form 


p(t) = ant” + apt) +t at + ao: 
If we define 
p(D) = a,D" + an_1D""1 +--+. +a, D + apl, 
then p(D) is a linear operator on C*°. (See Appendix E.) 


Definitions. For any polynomial p(t) over C of positive degree, p(D) is 
called a differential operator. The order of the differential operator p(D) 
is the degree of the polynomial p(t). 


Differential operators are useful since they provide us with a means of 
reformulating a differential equation in the context of linear algebra. Any 
homogeneous linear differential equation with constant coefficients, 


y(”) + An—1y"—Y) AB ge oe ayy) + agy = 0, 
can be rewritten using differential operators as 
(D" + ap—1D"~* +--+» + a1D + aol)(y) = 0. 


Definition. Given the differential equation above, the complex polyno- 
mial 


p(t) =t" + anit” | +--+ + ait + a0 
is called the auxiliary polynomial associated with the equation. 


For example, (3) has the auxiliary polynomial 
k 
p(t) =? +—. 
m 


Any homogeneous linear differential equation with constant coefficients 
can be rewritten as 


where p(t) is the auxiliary polynomial associated with the equation. Clearly, 
this equation implies the following theorem. 
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Theorem 2.28. The set of all solutions to a homogeneous linear differen- 
tial equation with constant coefficients coincides with the null space of p(D), 
where p(t) is the auxiliary polynomial associated with the equation. 


Proof. Exercise. | 


Corollary. The set of all solutions to a homogeneous linear differential 
equation with constant coefficients is a subspace of C®. 


In view of the preceding corollary, we call the set of solutions to a homo- 
geneous linear differential equation with constant coefficients the solution 
space of the equation. A practical way of describing such a space is in terms 
of a basis. We now examine a certain class of functions that is of use in 
finding bases for these solution spaces. 

For a real number s, we are familiar with the real number e*, where e is 
the unique number whose natural logarithm is 1 (i.e., ne = 1). We know, 
for instance, certain properties of exponentiation, namely, 

e =e’e’ and e f= = 
e€ 

for any real numbers s and t. We now extend the definition of powers of e to 
include complex numbers in such a way that these properties are preserved. 


Definition. Let c= a+ib be a complex number with real part a and 
imaginary part b. Define 


e° = e“(cosb + isin b). 
The special case 


> = cosh + isin 


s 
is called Euler’s formula. 
For example, for c = 2 + i(7/3), 


1 
ese (cos 5 + isin 2) =e? G+8) ; 


Clearly, if c is real (b = 0), then we obtain the usual result: e© = e*. Using 
the approach of Example 2, we can show by the use of trigonometric identities 
that 


e =e°e® and e °= — 


for any complex numbers c and d. 
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Definition. A function f: R — C defined by f(t) = e“ for a fixed 
complex number c is called an exponential function. 


The derivative of an exponential function, as described in the next theo- 
rem, is consistent with the real version. The proof involves a straightforward 
computation, which we leave as an exercise. 


Theorem 2.29. For any exponential function f(t) = e“, f’(t) = ce. 
Proof. Exercise. i 


We can use exponential functions to describe all solutions to a homoge- 
neous linear differential equation of order 1. Recall that the order of such an 
equation is the degree of its auxiliary polynomial. Thus an equation of order 
1 is of the form 


y’ + aoy = 0. (5) 


Theorem 2.30. The solution space for (5) is of dimension 1 and has 
{e~%"} as a basis. 


Proof. Clearly (5) has e~°! as a solution. Suppose that x(t) is any solution 
to (5). Then 


x(t) = —aox(t) forallt € R. 
Define 
2(t) = e7°* x(t). 
Differentiating z yields 
2! (t) = (e%")'ax(t) + e%* a’ (t) = ape™*x(t) — ape**x(t) = 0. 


(Notice that the familiar product rule for differentiation holds for complex- 
valued functions of a real variable. A justification of this involves a lengthy, 
although direct, computation.) 

Since 2’ is identically zero, z is a constant function. (Again, this fact, well 
known for real-valued functions, is also true for complex-valued functions. 
The proof, which relies on the real case, involves looking separately at the 
real and imaginary parts of z.) Thus there exists a complex number & such 
that 


z(t) =e*'a(t) =k for allt € RB. 
So 
oh) = ke OO 


We conclude that any solution to (5) is a linear combination of e~%°". | 
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Another way of stating Theorem 2.30 is as follows. 


Corollary. For any complex number c, the null space of the differential 
operator D — cl has {e“'} as a basis. 


We next concern ourselves with differential equations of order greater 
than one. Given an nth order homogeneous linear differential equation with 
constant coefficients, 


y) + any? +--+ ary + aoy = 0, 
its auxiliary polynomial 
pf) =t" +epat™ > +--+ ait fa 
factors into a product of polynomials of degree 1, that is, 
p(t) = (f— c1)(t — ca) +++ (f— en), 


where ¢1,C€2,-.-,€n are (not necessarily distinct) complex numbers. (This 
follows from the fundamental theorem of algebra in Appendix D.) Thus 


p(D) = (D — eI)(D — eal) ---(D— el). 
The operators D — c;| commute, and so, by Exercise 9, we have that 
N(D — el) C N(p(D)) - for alli. 


Since N(p(D)) coincides with the solution space of the given differential equa- 
tion, we can deduce the following result from the preceding corollary. 


Theorem 2.31. Let p(t) be the auxiliary polynomial for a homogeneous 
linear differential equation with constant coefficients. For any complex num- 
ber c, if c is a zero of p(t), then e® is a solution to the differential equation. 


Example 4 


Given the differential equation 
y" — 3y' + 2y = 0, 
its auxiliary polynomial is 


p(t) = t? — 3t +2 = (t-— 1)(¢- 2). 


Hence, by Theorem 2.31, e’ and ec?! are solutions to the differential equa- 
tion because c = 1 and c = 2 are zeros of p(t). Since the solution space 
of the differential equation is a subspace of C°, span({e’,e?/}) lies in the 
solution space. It is a simple matter to show that {e!,e?’} is linearly inde- 
pendent. Thus if we can show that the solution space is two-dimensional, we 
can conclude that {e’,e?*} is a basis for the solution space. This result is a 
consequence of the next theorem. 
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Theorem 2.32. For any differential operator p(D) of order n, the null 
space of p(D) is an n-dimensional subspace of C®. 


As a preliminary to the proof of Theorem 2.32, we establish two lemmas. 


Lemma 1. The differential operator D — cl: C°° — C® is onto for any 
complex number c. 


Proof. Let v € C*. We wish to find a u € C™ such that (D — cl)u = v. 
Let w(t) = v(t)e-“ for t € R. Clearly, w € C® because both v and e~“ lie in 
Cc. Let w; and wz be the real and imaginary parts of w. Then w; and we are 
continuous because they are differentiable. Hence they have antiderivatives, 
say, W, and W9, respectively. Let W: R — C be defined by 


W(t) = W(t) + iWo(t) fort € R. 


Then W € C®, and the real and imaginary parts of W are W, and Wo, 
respectively. Furthermore, W’ = w. Finally, let u: R — C be defined by 
u(t) = W(t)e* for t € R. Clearly u € C®, and since 


(D — cl)u(t) = u’(t) — cu(t) 
= W'(t)e* + W(t)ce™ — cW(t)e 


we have (D — cl)u = v. | 


Lemma 2. Let V be a vector space, and suppose that T and U are 
linear operators on V such that U is onto and the null spaces of T and U are 
finite-dimensional. Then the null space of TU is finite-dimensional, and 


dim(N(TU)) = dim(N(T)) + dim(N(U)). 


Proof. Let p = dim(N(T)), ¢ = dim(N(U)), and {ui,u2,...,up} and 
{U1, V2,-..,Uq} be bases for N(T) and N(U), respectively. Since U is onto, 
we can choose for each i (1 < i < p) a vector w; € V such that U(w;) = uj. 
Note that the w,’s are distinct. Furthermore, for any i and j, wi 4 vj, for 
otherwise wu; = U(w;) = U(v;) = 0—a contradiction. Hence the set 


B= 1W1Way.0s 5 Woy Vis Voy. 5 Ug} 


contains p+q distinct vectors. To complete the proof of the lemma, it suffices 
to show that @ is a basis for N(TU). 
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We first show that § generates N(TU). Since for any w,; and v; in 8, 
TU(w;) = T(ui) = 0 and TU(v;) = T(0) = 0, it follows that 6 C N(TU). 
Now suppose that v € N(TU). Then 0 = TU(v) = T(U(v)). Thus U(v) € 
N(T). So there exist scalars a),a2,...,@, such that 


U(v) = ayuy + agua +--+ + aptlp 
= a,U (wi) + agU (we) +--+ + apU (wp) 


U(aywi + agwe +--+ Gpwp). 


Hence 
U(u — (aywi + agwe +--++apwp)) = 0. 
Consequently, v — (a1w1 + agwe +--+: + apwp) lies in N(U). It follows that 
there exist scalars 61, b2,...,b, such that 
v — (a, Wi + dgqwe + +++ + ApWp) = b1u1 + bove + +++ + bgug 
or 
UV = AW, + AQWwe + +++ + ApWp + b1V1 + bgvg + +++ + bgt. 


Therefore 3 spans N(TU). 
To prove that @ is linearly independent, let a1, az,...,@p, 61, b2,...,bg be 
any scalars such that 


AW + AgW2 ++++ + ApWy + by v1 + dove + +++ 4+ dgvg = 0. (6) 
Applying U to both sides of (6), we obtain 
a, U, + Agug ++++ +4pUp) = 0. 


Since {w1,u2,... ,Up} is linearly independent, the a;’s are all zero. Thus (6) 
reduces to 


byvy + bav2 + +++ + bgvg = 0. 


Again, the linear independence of {v1,v2,...,vg} implies that the ;’s are 
all zero. We conclude that 6 is a basis for N(TU). Hence N(TU) is finite- 
dimensional, and dim(N(TU)) = p + q = dim(N(T)) + dim(N(U)). | 


Proof of Theorem 2.82. The proof is by mathematical induction on the 
order of the differential operator p(D). The first-order case coincides with 
Theorem 2.30. For some integer n > 1, suppose that Theorem 2.32 holds 
for any differential operator of order less than n, and consider a differential 
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operator p(D) of order n. The polynomial p(t) can be factored into a product 
of two polynomials as follows: 


p(t) = q(t)(t— o), 


where q(t) is a polynomial of degree n — 1 and c is a complex number. Thus 
the given differential operator may be rewritten as 


p(D) = q(D)(D — el). 


Now, by Lemma 1, D — cl is onto, and by the corollary to Theorem 2.30, 
dim(N(D — cl)) =1. Also, by the induction hypothesis, dim(N(q(D)) = n—1. 
Thus, by Lemma 2, we conclude that 


dim(N(p(D))) = dim(N(q(D))) + dim(N(D — el) 
= (n — 1) +l=n. | 


Corollary. The solution space of any nth-order homogeneous linear dif- 
ferential equation with constant coefficients is an n-dimensional subspace of 


Co, 


The corollary to Theorem 2.32 reduces the problem of finding all solutions 
to an nth-order homogeneous linear differential equation with constant coeffi- 
cients to finding a set of n linearly independent solutions to the equation. By 
the results of Chapter 1, any such set must be a basis for the solution space. 
The next theorem enables us to find a basis quickly for many such equations. 
Hints for its proof are provided in the exercises. 


Theorem 2.33. Given n distinct complex numbers cy, C2,...,Cn, the set 
of exponential functions {e°*, e°',...,e°"*} is linearly independent. 
Proof. Exercise. (See Exercise 10.) | 


Corollary. For any nth-order homogeneous linear differential equation 
with constant coefficients, if the auxiliary polynomial has n distinct zeros 
C1, C2,---,€n, then {e%®, e@*,...,e°""} is a basis for the solution space of the 
differential equation. 


Proof. Exercise. (See Exercise 10.) | 


Example 5 


We find all solutions to the differential equation 


y” + 5y’ + 4y = 0. 
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Since the auxiliary polynomial factors as (¢ + 4)(t+ 1), it has two distinct 
zeros, —1 and —4. Thus {e~',e~“*} is a basis for the solution space. So any 
solution to the given equation is of the form 


y(t) = bie * + doe ** 
for unique scalars b} and bo. @ 


Example 6 


We find all solutions to the differential equation 
y” + 9y = 0. 


The auxiliary polynomial t? + 9 factors as (t — 3i)(t + 3) and hence has 


distinct zeros cy = 3% and cg = —3i. Thus {e?"’,e~3"'} is a basis for the 
solution space. Since 


1 | si as ee 
cos 3t = se +e%*) and sin3t= Be — 7 3it), 
i 


it follows from Exercise 7 that {cos 3t,sin 3t} is also a basis for this solution 
space. This basis has an advantage over the original one because it consists of 
the familiar sine and cosine functions and makes no reference to the imaginary 
number 7. Using this latter basis, we see that any solution to the given 
equation is of the form 


y(t) = b; cos 3t + bz sin 3t 
for unique scalars band by. 
Next consider the differential equation 
y" + 2y' +y= 0, 


for which the auxiliary polynomial is (t+ 1)?. By Theorem 2.31, e~* is a 
solution to this equation. By the corollary to Theorem 2.32, its solution 
space is two-dimensional. In order to obtain a basis for the solution space, 
we need a solution that is linearly independent of e~*. The reader can verify 
that te~* is a such a solution. The following lemma extends this result. 


Lemma. For a given complex number c and positive integer n, suppose 
that (t — c)" is the auxiliary polynomial of a homogeneous linear differential 
equation with constant coefficients. Then the set 


B = {e, te, pore pete 


is a basis for the solution space of the equation. 
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Proof. Since the solution space is n-dimensional, we need only show that 
G is linearly independent and lies in the solution space. First, observe that 
for any positive integer k, 


(D'—cl) (f*e) = kt* 12% + ct*e™ — ct*e* 
Sie, 
Hence for k <n, 
(D — cl)" (t*e*) = 0. 


It follows that ( is a subset of the solution space. 
We next show that ( is linearly independent. Consider any linear combi- 
nation of vectors in 3 such that 


boe™ + byte +--+ + bp-at™ +e% = 0 (7) 
for some scalars bo,01,..-,6n—1- Dividing by e® in (7), we obtain 
bo t byt tee + bp_it” 1 = 0. (8) 
Thus the left side of (8) must be the zero polynomial function. We conclude 
that the coefficients bo, b1,...,bn—1 are all zero. So @ is linearly independent 
and hence is a basis for the solution space. | 
Example 7 


We find all solutions to the differential equation 


y — ay) + by? — dy) + y = 0. 


Since the auxiliary polynomial is 


t* — 463 + 647 — 4t4+1= (¢- 1), 


we can immediately conclude by the preceding lemma that {e', te’, t7e!, t?e"} 
is a basis for the solution space. So any solution y to the given differential 
equation is of the form 


y(t) = bye’ + bote’ + bgt7e* + bat? e? 


for unique scalars b,,b2,b3, and by. = 
The most general situation is stated in the following theorem. 


Theorem 2.34. Given a homogeneous linear differential equation with 
constant coefficients and auxiliary polynomial 


(OSE NE Cy) PA eh eee, 


where n1,2,...,M% are positive integers and c1,Co,...,Ccz are distinct com- 
plex numbers, the following set is a basis for the solution space of the equation: 


t t -loecit Jt t -l ext 
{CFE EE RE ore CORY LET eo BOR eC ReN 
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Proof. Exercise. i 


Example 8 


The differential equation 


y®) — dy?) + 5yD — 2y = 0 


has the auxiliary polynomial 


8 — 42? + 5t-—2 = (t-—1)?(¢ — 2). 


By Theorem 2.34, {e', te’, e?“} is a basis for the solution space of the differ- 
ential equation. Thus any solution y has the form 


y(t) = bre’ + bate’ + bge”* 


for unique scalars b;,b2, and b3. 


EXERCISES 


1. Label the following statements as true or false. 


(a) 


(b) 
(c) 


(d) 


(e) 


(f) 


(g) 


The set of solutions to an nth-order homogeneous linear differential 
equation with constant coefficients is an n-dimensional subspace of 
Co, 

The solution space of a homogeneous linear differential equation 
with constant coefficients is the null space of a differential operator. 
The auxiliary polynomial of a homogeneous linear differential 
equation with constant coefficients is a solution to the differential 
equation. 

Any solution to a homogeneous linear differential equation with 
constant coefficients is of the form ae“ or at*e“, where a and c 
are complex numbers and k is a positive integer. 

Any linear combination of solutions to a given homogeneous linear 
differential equation with constant coefficients is also a solution to 
the given equation. 

For any homogeneous linear differential equation with constant 
coefficients having auxiliary polynomial p(t), if c,,c2,...,c, are 
the distinct zeros of p(t), then {e%", e*,...,e°*} is a basis for 
the solution space of the given differential equation. 

Given any polynomial p(t) € P(C), there exists a homogeneous lin- 
ear differential equation with constant coefficients whose auxiliary 
polynomial is p(t). 
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2. 


For each of the following parts, determine whether the statement is true 
or false. Justify your claim with either a proof or a counterexample, 
whichever is appropriate. 


(a) Any finite-dimensional subspace of C® is the solution space of a 
homogeneous linear differential equation with constant coefficients. 

(b) There exists a homogeneous linear differential equation with con- 
stant coefficients whose solution space has the basis {t, t?}. 

(c) For any homogeneous linear differential equation with constant 
coefficients, if x is a solution to the equation, so is its derivative 


x’. 


Given two polynomials p(t) and q(t) in P(C), if « € N(p(D)) and y € 
N(q(D)), then 


(d) x+y €N(p(D)q(D)). 
(e) xy € N(p(D)q(D)). 


Find a basis for the solution space of each of the following differential 
equations. 


(a) y”+2y'’+y=0 
(6): oa 
OF 

d) y+2y+y=0 


Find a basis for each of the following subspaces of C°. 
(a) N(D?-D-—1l) 

(b) N(D? — 3D? + 3D — 1) 

(c) N(D* + 6D? + 8D) 


Show that C°° is a subspace of F(R,C). 


(a) Show that D: C° — C® is a linear operator. 
(b) Show that any differential operator is a linear operator on C™. 


Prove that if {x,y} is a basis for a vector space over C,, then so is 


{pet gle-w} 


Consider a second-order homogeneous linear differential equation with 
constant coefficients in which the auxiliary polynomial has distinct con- 
jugate complex roots a+ ib and a — ib, where a,b € R. Show that 
{e*' cos bt, e™ sin bt} is a basis for the solution space. 
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10. 


11. 


12. 


13. 
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Suppose that {U,;,U2,...,U,,} is a collection of pairwise commutative 
linear operators on a vector space V (i.e., operators such that U;U; = 
U,U; for all 2,7). Prove that, for any i (1 <i <n), 


N(U,) © N(U,Us-+-U,,). 


Prove Theorem 2.33 and its corollary. Hint: Suppose that 
bye!" + bye?! +--+» + bye’ = 0 (where the c;’s are distinct). 


To show the 0,’s are zero, apply mathematical induction on n as follows. 
Verify the theorem for n = 1. Assuming that the theorem is true for 
n — 1 functions, apply the operator D — c, to both sides of the given 
equation to establish the theorem for n distinct exponential functions. 


Prove Theorem 2.34. Hint: First verify that the alleged basis lies in 
the solution space. Then verify that this set is linearly independent by 
mathematical induction on k as follows. The case k = 1 is the lemma 
to Theorem 2.34. Assuming that the theorem holds for k — 1 distinct 
c;’8, apply the operator (D — c,l)"* to any linear combination of the 
alleged basis that equals 0. 


Let V be the solution space of an nth-order homogeneous linear differ- 
ential equation with constant coefficients having auxiliary polynomial 
p(t). Prove that if p(t) = g(t)h(t), where g(t) and h(t) are polynomials 
of positive degree, then 


N(h(D)) = R(g(Dv)) = g(D)(V), 


where Dy: V > V is defined by Dy() = 2’ for  € V. Hint: First prove 
g(D)(V) GC N(A(D)). Then prove that the two spaces have the same 
finite dimension. 


A differential equation 
y™) + An—1y"—)) at ee ayy) + aoy =x 


is called a nonhomogeneous linear differential equation with constant 
coefficients if the a,’s are constant and z is a function that is not iden- 
tically zero. 


(a) Prove that for any x € C™ there exists y € C™ such that y is 
a solution to the differential equation. Hint: Use Lemma 1 to 
Theorem 2.32 to show that for any polynomial p(t), the linear 
operator p(D): C%° — C® is onto. 
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14. 


15. 


16. 


(b) Let V be the solution space for the homogeneous linear equation 
y) + aay 2 ee ayy + agy = 0. 


Prove that if z is any solution to the associated nonhomogeneous 
linear differential equation, then the set of all solutions to the 
nonhomogeneous linear differential equation is 


{z+y: ye V}. 


Given any nth-order homogeneous linear differential equation with con- 
stant coefficients, prove that, for any solution x and any to € R, if 
x(to) = a'(to) =--» = x("-)) (to) = 0, then 2 = 0 (the zero function). 
Hint: Use mathematical induction on n as follows. First prove the con- 
clusion for the case n = 1. Next suppose that it is true for equations of 
order n — 1, and consider an nth-order differential equation with aux- 
iliary polynomial p(t). Factor p(t) = q(t)(t — c), and let z = q((D))z. 
Show that z(t) = 0 and z’—cz = 0 to conclude that z = 0. Now apply 
the induction hypothesis. 


Let V be the solution space of an nth-order homogeneous linear dif- 
ferential equation with constant coefficients. Fix tg € R, and define a 
mapping ®: V — C” by 


O(x) = : for each x in V. 
a) (to) 


(a) Prove that ® is linear and its null space is the zero subspace of V. 
Deduce that ® is an isomorphism. Hint: Use Exercise 14. 

(b) Prove the following: For any nth-order homogeneous linear dif- 
ferential equation with constant coefficients, any tg € R, and any 
complex numbers co, ¢1,.--,€n—1 (not necessarily distinct), there 
exists exactly one solution, x, to the given differential equation 
such that x(to) = co and x"*)(to) = cz for k= 1,2,...n—1. 


Pendular Motion. It is well known that the motion of a pendulum is 
approximated by the differential equation 


6! + 70 =p 
where O(t) is the angle in radians that the pendulum makes with a 
vertical line at time t (see Figure 2.8), interpreted so that 6 is positive 
if the pendulum is to the right and negative if the pendulum is to the 
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Figure 2.8 


left of the vertical line as viewed by the reader. Here / is the length 
of the pendulum and g is the magnitude of acceleration due to gravity. 
The variable t and constants J and g must be in compatible units (e.g., 
t in seconds, / in meters, and g in meters per second per second). 


(a) Express an arbitrary solution to this equation as a linear combi- 
nation of two real-valued solutions. 

(b) Find the unique solution to the equation that satisfies the condi- 
tions 


6(0) =% >0 and 6/(0)=0. 


(The significance of these conditions is that at time t = 0 the 
pendulum is released from a position displaced from the vertical 
by 8.) 

(c) Prove that it takes 27/1/g units of time for the pendulum to make 
one circuit back and forth. (This time is called the period of the 
pendulum.) 


Periodic Motion of a Spring without Damping. Find the general solu- 
tion to (3), which describes the periodic motion of a spring, ignoring 
frictional forces. 


Periodic Motion of a Spring with Damping. The ideal periodic motion 
described by solutions to (3) is due to the ignoring of frictional forces. 
In reality, however, there is a frictional force acting on the motion that 
is proportional to the speed of motion, but that acts in the opposite 
direction. The modification of (3) to account for the frictional force, 
called the damping force, is given by 


my" + ry’ + ky = 0, 


where r > 0 is the proportionality constant. 


(a) Find the general solution to this equation. 
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Find the unique solution in (a) that satisfies the initial conditions 
y(0) = 0 and y’(0) = wv, the initial velocity. 

For y(t) as in (b), show that the amplitude of the oscillation de- 
creases to zero; that is, prove that jim, y(t) = 0. 


In our study of differential equations, we have regarded solutions as 
complex-valued functions even though functions that are useful in de- 
scribing physical motion are real-valued. Justify this approach. 


The following parts, which do not involve linear algebra, are included 
for the sake of completeness. 


(a) 
(b) 


(c) 
(d) 
(e) 


(f) 


Prove Theorem 2.27. Hint: Use mathematical induction on the 
number of derivatives possessed by a solution. 
For any c,d € C, prove that 


Prove Theorem 2.28. 

Prove Theorem 2.29. 

Prove the product rule for differentiating complex-valued func- 
tions of a real variable: For any differentiable functions x and 
y in F(R,C), the product «zy is differentiable and 


(xy) =a'y + ay’. 


Hint: Apply the rules of differentiation to the real and imaginary 
parts of wry. 

Prove that if ¢ € F(R,C) and a’ = 0, then x is a constant func- 
tion. 
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Elementary Matrix 
Operations and Systems 
of Linear Equations 


3.1 Elementary Matrix Operations and Elementary Matrices 
3.2. The Rank of a Matrix and Matrix Inverses 

3.3. Systems of Linear Equations—Theoretical Aspects 

3.4 Systems of Linear Equations—Computational Aspects 


This chapter is devoted to two related objectives: 


1. the study of certain “rank-preserving” operations on matrices; 
2. the application of these operations and the theory of linear transforma- 
tions to the solution of systems of linear equations. 


As a consequence of objective 1, we obtain a simple method for com- 
puting the rank of a linear transformation between finite-dimensional vector 
spaces by applying these rank-preserving matrix operations to a matrix that 
represents that transformation. 

Solving a system of linear equations is probably the most important ap- 
plication of linear algebra. The familiar method of elimination for solving 
systems of linear equations, which was discussed in Section 1.4, involves the 
elimination of variables so that a simpler system can be obtained. The tech- 
nique by which the variables are eliminated utilizes three types of operations: 


1. interchanging any two equations in the system; 
2. multiplying any equation in the system by a nonzero constant; 
3. adding a multiple of one equation to another. 


In Section 3.3, we express a system of linear equations as a single matrix 
equation. In this representation of the system, the three operations above 
are the “elementary row operations” for matrices. These operations provide 
a convenient computational method for determining all solutions to a system 
of linear equations. 
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3.1 ELEMENTARY MATRIX OPERATIONS AND ELEMENTARY 
MATRICES 


In this section, we define the elementary operations that are used throughout 
the chapter. In subsequent sections, we use these operations to obtain simple 
computational methods for determining the rank of a linear transformation 
and the solution of a system of linear equations. There are two types of el- 
ementary matrix operations—row operations and column operations. As we 
will see, the row operations are more useful. They arise from the three opera- 
tions that can be used to eliminate variables in a system of linear equations. 


Definitions. Let A be an m x n matrix. Any one of the following 
three operations on the rows [columns] of A is called an elementary row 
[column] operation: 


(1) interchanging any two rows [columns] of A; 
(2) multiplying any row [column] of A by a nonzero scalar; 
(3) adding any scalar multiple of a row [column] of A to another row [col- 
umn]. 
Any of these three operations is called an elementary operation. Elemen- 


tary operations are of type 1, type 2, or type 3 depending on whether they 
are obtained by (1), (2), or (3). 


Example 1 
Let 


1 2 3°«A4 
A={2 1 -1 3 
4 0 1 2 


Interchanging the second row of A with the first row is an example of an 
elementary row operation of type 1. The resulting matrix is 


2 1 -1 3 
B=|1 2 3 4 
4 0 1 2 
Multiplying the second column of A by 3 is an example of an elementary 


column operation of type 2. The resulting matrix is 


1 
C=12 
4 


OoOWD 


3 
-1 
1 


Nw 
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Adding 4 times the third row of A to the first row is an example of an 
elementary row operation of type 3. In this case, the resulting matrix is 


17 2 7 12 
M={2 1-1 38). 
4 0 1 2 


Notice that if a matrix Q can be obtained from a matrix P by means of an 
elementary row operation, then P can be obtained from Q by an elementary 
row operation of the same type. (See Exercise 8.) So, in Example 1, A can 
be obtained from M by adding —4 times the third row of M to the first row 
of M. 


Definition. Ann x n elementary matrix is a matrix obtained by 
performing an elementary operation on I,,._ The elementary matrix is said 
to be of type 1, 2, or 3 according to whether the elementary operation 
performed on I, is a type 1, 2, or 3 operation, respectively. 


For example, interchanging the first two rows of Iz produces the elemen- 
tary matrix 


E= 


ove "S 


1 0 
0 0 
0 1 


Note that FE can also be obtained by interchanging the first two columns of 
I3. In fact, any elementary matrix can be obtained in at least two ways— 
either by performing an elementary row operation on [,, or by performing an 
elementary column operation on I,. (See Exercise 4.) Similarly, 


1 0 
0 1 0 
0 0 


is an elementary matrix since it can be obtained from J3 by an elementary 
column operation of type 3 (adding —2 times the first column of [3 to the 
third column) or by an elementary row operation of type 3 (adding —2 times 
the third row to the first row). 

Our first theorem shows that performing an elementary row operation on 
a matrix is equivalent to multiplying the matrix by an elementary matrix. 


Theorem 3.1. Let A€ Mnyn(F), and suppose that B is obtained from 
A by performing an elementary row [column] operation. Then there exists an 
m xm [n x n| elementary matrix E such that B = EA |B = AE]. In fact, 
E is obtained from I, [In] by performing the same elementary row [column] 
operation as that which was performed on A to obtain B. Conversely, if FE is 
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an elementary m x m [n x n] matrix, then EA [AE] is the matrix obtained 
from A by performing the same elementary row [column] operation as that 
which produces FE from I, [In]. 


The proof, which we omit, requires verifying Theorem 3.1 for each type 
of elementary row operation. The proof for column operations can then be 
obtained by using the matrix transpose to transform a column operation into 
a row operation. The details are left as an exercise. (See Exercise 7.) 

The next example illustrates the use of the theorem. 


Example 2 


Consider the matrices A and B in Example 1. In this case, B is obtained from 
A by interchanging the first two rows of A. Performing this same operation 
on Is, we obtain the elementary matrix 


o-oo: 
Ho oS 


Note that HA = B. 


In the second part of Example 1, C is obtained from A by multiplying the 
second column of A by 3. Performing this same operation on J4, we obtain 
the elementary matrix 


ooor 
oOoOWO 
oro fo 
roo oO 


Observe that AE=C. 


It is a useful fact that the inverse of an elementary matrix is also an 
elementary matrix. 


Theorem 3.2. Elementary matrices are invertible, and the inverse of an 
elementary matrix is an elementary matrix of the same type. 


Proof. Let E be an elementary n x n matrix. Then FE can be obtained by 
an elementary row operation on J,. By reversing the steps used to transform 
I, into E, we can transform FE back into J,. The result is that [, can 
be obtained from F& by an elementary row operation of the same type. By 
Theorem 3.1, there is an elementary matrix H such that EE = I,,. Therefore, 
by Exercise 10 of Section 2.4, E is invertible and E~! = E. i 
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1. Label the following statements as true or false. 


(a) An elementary matrix is always square. 
b) The only entries of an elementary matrix are zeros and ones. 
(b) y y 
(c) The n x n identity matrix is an elementary matrix. 
(d) The product of two n x n elementary matrices is an elementary 
matrix. 
e) The inverse of an elementary matrix is an elementary matrix. 
y y 
f) The sum of two nxn elementary matrices is an elementary matrix. 
y y 
The transpose of an elementary matrix is an elementary matrix. 
g Pp y, y 
h) If Bisa matrix that can be obtained by performing an elementar 
y g » 
row operation on a matrix A, then B can also be obtained by 
performing an elementary column operation on A. 
i) If B is a matrix that can be obtained by performing an elemen- 
g 
tary row operation on a matrix A, then A can be obtained by 
performing an elementary row operation on B. 


2. Let 
1 2 3 1 0 3 1 0 3 
A= {1 0 1),B=]1 -2 1),andC=]0 —-2 —2 
1 -1 1 1 -3 1 1 -3 1 


Find an elementary operation that transforms A into B and an elemen- 
tary operation that transforms B into C’. By means of several additional 
operations, transform C' into Is. 


3. Use the proof of Theorem 3.2 to obtain the inverse of each of the fol- 
lowing elementary matrices. 
1 0 0 1 0 0 
(b) 10 3 0 (c) 0 1 0 
0 0 1 —2 0 1 


4. Prove the assertion made on page 149: Any elementary n xn matrix can 
be obtained in at least two ways—either by performing an elementary 
row operation on J, or by performing an elementary column operation 
on In. 


5. Prove that FE is an elementary matrix if and only if E° is. 


6. Let A be an mx n matrix. Prove that if B can be obtained from A by 
an elementary row [column] operation, then B‘ can be obtained from 
At by the corresponding elementary column [row] operation. 


7. Prove Theorem 3.1. 
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8. Prove that ifa matrix Q can be obtained from a matrix P by an elemen- 
tary row operation, then P can be obtained from Q by an elementary 
matrix of the same type. Hint: Treat each type of elementary row 
operation separately. 


9. Prove that any elementary row [column] operation of type 1 can be 
obtained by a succession of three elementary row [column] operations 
of type 3 followed by one elementary row [column] operation of type 2. 


10. Prove that any elementary row [column] operation of type 2 can be 
obtained by dividing some row [column] by a nonzero scalar. 


11. Prove that any elementary row [column] operation of type 3 can be 
obtained by subtracting a multiple of some row [column] from another 
row [column]. 


12. Let A be an m xX n matrix. Prove that there exists a sequence of 
elementary row operations of types 1 and 3 that transforms A into an 
upper triangular matrix. 


3.2. THE RANK OF A MATRIX AND MATRIX INVERSES 


In this section, we define the rank of a matrix. We then use elementary 
operations to compute the rank of a matrix and a linear transformation. The 
section concludes with a procedure for computing the inverse of an invertible 
matrix. 


Definition. If A € My.x»(F), we define the rank of A, denoted rank( A), 
to be the rank of the linear transformation L4: F" — F™. 


Many results about the rank of a matrix follow immediately from the 
corresponding facts about a linear transformation. An important result of 
this type, which follows from Fact 3 (p. 100) and Corollary 2 to Theorem 2.18 
(p. 102), is that an n x n matrix is invertible if and only if its rank is n. 

Every matrix A is the matrix representation of the linear transformation 
La with respect to the appropriate standard ordered bases. Thus the rank 
of the linear transformation L,4 is the same as the rank of one of its matrix 
representations, namely, A. The next theorem extends this fact to any ma- 
trix representation of any linear transformation defined on finite-dimensional 
vector spaces. 


Theorem 3.3. Let T: V — W bea linear transformation between finite- 
dimensional vector spaces, and let G and y be ordered bases for V and W, 
respectively. Then rank(T) = rank([T]3). 


Proof. This is a restatement of Exercise 20 of Section 2.4. | 


Sec. 3.2. The Rank of a Matrix and Matrix Inverses 153 


Now that the problem of finding the rank of a linear transformation has 
been reduced to the problem of finding the rank of a matrix, we need a result 
that allows us to perform rank-preserving operations on matrices. The next 
theorem and its corollary tell us how to do this. 


Theorem 3.4. Let A be an m x n matrix. If P and Q are invertible 
mx m and n Xx n matrices, respectively, then 
(a) rank(AQ) = rank(A), 
(b) rank(PA) = rank(A), 
and therefore, 
(c) rank(PAQ) = rank(A). 


Proof. First observe that 
R(Lag) = R(LaL@) = LaLe(F") = La(Le(F”)) = La(F") = R(La) 
since Lg is onto. Therefore 
rank(AQ) = dim(R(Lag)) = dim(R(L4)) = rank(A). 


This establishes (a). To establish (b), apply Exercise 17 of Section 2.4 to 
T=Lp. We omit the details. Finally, applying (a) and (b), we have 


rank(PAQ) = rank(PA) = rank(A). | 


Corollary. Elementary row and column operations on a matrix are rank- 
preserving. 


Proof. If B is obtained from a matrix A by an elementary row operation, 
then there exists an elementary matrix F’ such that B = EA. By Theorem 3.2 
(p. 150), E is invertible, and hence rank(B) = rank(A) by Theorem 3.4. The 
proof that elementary column operations are rank-preserving is left as an 
exercise. | 


Now that we have a class of matrix operations that preserve rank, we 
need a way of examining a transformed matrix to ascertain its rank. The 
next theorem is the first of several in this direction. 


Theorem 3.5. The rank of any matrix equals the maximum number of its 
linearly independent columns; that is, the rank of a matrix is the dimension 
of the subspace generated by its columns. 


Proof. For any A € Mmxn(F), 


rank(A) = rank(L4) = dim(R(L,)). 
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Let @ be the standard ordered basis for F”. Then (@ spans F” and hence, by 
Theorem 2.2 (p. 68), 


R(La) = span(L4()) = span ({La(e1), La(e2),...,La(en)}). 


But, for any 7, we have seen in Theorem 2.13(b) (p. 90) that La(e;) = Ae; = 
a;, where a; the jth column of A. Hence 


R(La) = span ({a1, @2,...,@n}). 


Thus 
rank(A) = dim(R(L4)) = dim(span ({a1, a2,...,@n})). | 
Example 1 
Let 
1 0 1 
A={0 1 1 
1 0 1 


Observe that the first and second columns of A are linearly independent and 
that the third column is a linear combination of the first two. Thus 


1 0 1 
rank(A) = dim | span O},{1],141 =2. 4 
1 0 1 


To compute the rank of a matrix A, it is frequently useful to postpone the 
use of Theorem 3.5 until A has been suitably modified by means of appro- 
priate elementary row and column operations so that the number of linearly 
independent columns is obvious. The corollary to Theorem 3.4 guarantees 
that the rank of the modified matrix is the same as the rank of A. One 
such modification of A can be obtained by using elementary row and col- 
umn operations to introduce zero entries. The next example illustrates this 
procedure. 


Example 2 
Let 


1 2 1 
A={1 0 8 
1 1 2 


If we subtract the first row of A from rows 2 and 3 (type 3 elementary row 
operations), the result is 
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If we now subtract twice the first column from the second and subtract the 
first column from the third (type 3 elementary column operations), we obtain 


1 0 0 
0 -2 
0 -1 1 


It is now obvious that the maximum number of linearly independent columns 
of this matrix is 2. Hence the rank of Ais2. @ 


The next theorem uses this process to transform a matrix into a particu- 
larly simple form. The power of this theorem can be seen in its corollaries. 


Theorem 3.6. Let A be an m x n matrix of rank r. Thenr <m,r <n, 
and, by means of a finite number of elementary row and column operations, 
A can be transformed into the matrix 


I, O71 
Cr 
where O;, Oz, and Oz are zero matrices. Thus Dy, = 1 fori <r and Dj; = 0 


otherwise. 


Theorem 3.6 and its corollaries are quite important. Its proof, though 
easy to understand, is tedious to read. As an aid in following the proof, we 
first consider an example. 

Example 3 


Consider the matrix 


Go a Be 
444 8 QO 
A=|!s 90 10 2 
6 3 2 9 4 


By means of a succession of elementary row and column operations, we can 
transform A into a matrix D as in Theorem 3.6. We list many of the inter- 
mediate matrices, but on several occasions a matrix is transformed from the 
preceding one by means of several elementary operations. The number above 
each arrow indicates how many elementary operations are involved. Try to 
identify the nature of each elementary operation (row or column and type) 
in the following matrix transformations. 


024 2 2 44 4 8 0 1 11 2 0 
4 4 4 8 0 oe 024 2 2 ke 024 2 2 2% 
8 2 0 10 2 8 2 0 10 2 8 2 0 10 2 
6 3 2 9 1 63 2 9 1 63 2 9 1 
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Ch op Bog °° Or: Oe 

OG: 2 de De pe Re A Oe eg 

0 6. <8: 6: 3 0. 26. 8: Sea 7 

iy OG eb 129 Oo ya 2258 
1 0 0 00 Li BOS GiGi 10000 
Or 2 a a Ok Oa Or 0), a 
0 —6 =8 -6 2 00408 00408 
a a ae aa 00204 00204 
1. OO OG 10000 Ts 40> M00 (05-4) 
OO) 08 10) sit (GFL sO 0) ae te) Ole 
00 1-0 2 Or a re OG? 06 
00 2 0 4 000 0 0 00000 


By the corollary to Theorem 3.4, rank(A) = rank(D). Clearly, however, 
rank(D) = 3; sorank(A)=3. 


Note that the first two elementary operations in Example 3 result in a 
1 in the 1,1 position, and the next several operations (type 3) result in 0’s 
everywhere in the first row and first column except for the 1,1 position. Sub- 
sequent elementary operations do not change the first row and first column. 
With this example in mind, we proceed with the proof of Theorem 3.6. 


Proof of Theorem 3.6. If A is the zero matrix, r = 0 by Exercise 3. In 
this case, the conclusion follows with D = A. 

Now suppose that A # O and r = rank(A); then r > 0. The proof is by 
mathematical induction on m, the number of rows of A. 

Suppose that m = 1. By means of at most one type 1 column operation 
and at most one type 2 column operation, A can be transformed into a matrix 
with a 1 in the 1,1 position. By means of at most n — 1 type 3 column 
operations, this matrix can in turn be transformed into the matrix 


(10-26% 20): 


Note that there is one linearly independent column in D. So rank(D) = 
rank(A) = 1 by the corollary to Theorem 3.4 and by Theorem 3.5. Thus the 
theorem is established for m = 1. 

Next assume that the theorem holds for any matrix with at most m— 1 
rows (for some m > 1). We must prove that the theorem holds for any matrix 
with m rows. 

Suppose that A is any m x n matrix. If n = 1, Theorem 3.6 can be 
established in a manner analogous to that for m = 1 (see Exercise 10). 

We now suppose that n > 1. Since A # O, A;; # 0 for some i,j. By 
means of at most one elementary row and at most one elementary column 
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operation (each of type 1), we can move the nonzero entry to the 1,1 position 
(just as was done in Example 3). By means of at most one additional type 2 
operation, we can assure a 1 in the 1,1 position. (Look at the second operation 
in Example 3.) By means of at most m—1 type 3 row operations and at most 
n — 1 type 3 column operations, we can eliminate all nonzero entries in the 
first row and the first column with the exception of the 1 in the 1,1 position. 
(In Example 3, we used two row and three column operations to do this.) 

Thus, with a finite number of elementary operations, A can be transformed 
into a matrix 


B’ > 


where B’ is an (m— 1) x (n— 1) matrix. In Example 3, for instance, 


2 4 2 2 
Bi/=|-6 -8 -6 2 
-3 -4 -3 1 


By Exercise 11, B’ has rank one less than B. Since rank(A) = rank(B) = 
r, rank(B’) = r—1. Therefore r—1< m-—1 and r—1 < n-1 by the 
induction hypothesis. Hence r <_m and r <n. 

Also by the induction hypothesis, B’ can be transformed by a finite num- 
ber of elementary row and column operations into the (m—1) x (n—1) matrix 


D’ such that 
—_ Tp-1 O4 
p= (75, Da)? 


where O4, Os, and Og are zero matrices. That is, D’ consists of all zeros 
except for its first r — 1 diagonal entries, which are ones. Let 


| 0 --- 0 


1 
0 
D= D’ 
0 
We see that the theorem now follows once we show that D can be obtained 
from B by means of a finite number of elementary row and column operations. 
However this follows by repeated applications of Exercise 12. 
Thus, since A can be transformed into B and B can be transformed into 
D, each by a finite number of elementary operations, A can be transformed 
into D by a finite number of elementary operations. 
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Finally, since D’ contains ones as its first r—1 diagonal entries, D contains 
ones as its first r diagonal entries and zeros elsewhere. This establishes the 
theorem. i 


Corollary 1. Let A be an m x n matrix of rank r. Then there exist 
invertible matrices B and C of sizes m x m and n x n, respectively, such that 


D = BAC, where 
_ I, O71 
o— @ o) 


is them X n matrix in which O,, Oz, and Og are zero matrices. 


Proof. By Theorem 3.6, A can be transformed by means of a finite number 
of elementary row and column operations into the matrix D. We can appeal 
to Theorem 3.1 (p. 149) each time we perform an elementary operation. Thus 
there exist elementary m x m matrices E, K2,..., Ep and elementary n x n 
matrices G,,G2,...,G,q such that 


Ds BiB 4 Pgh eAG Ont Gy. 


By Theorem 3.2 (p. 150), each E; and G; is invertible. Let B = E,E,-1--- Ey 
and C = G,1G2:--G,. Then B and C are invertible by Exercise 4 of Sec- 
tion 2.4, and D = BAC. | 


Corollary 2. Let A be an m x n matrix. Then 

(a) rank(A*) = rank(A). 

(b) The rank of any matrix equals the maximum number of its linearly 
independent rows; that is, the rank of a matrix is the dimension of the 
subspace generated by its rows. 

(c) The rows and columns of any matrix generate subspaces of the same 
dimension, numerically equal to the rank of the matrix. 


Proof. (a) By Corollary 1, there exist invertible matrices B and C' such 
that D = BAC, where D satisfies the stated conditions of the corollary. 
Taking transposes, we have 


Dt = (BAC) = CtAtBt. 


Since B and C are invertible, so are B’ and C* by Exercise 5 of Section 2.4. 
Hence by Theorem 3.4, 


rank(A*) = rank(C* A‘ B*) = rank(D*). 


Suppose that r = rank(A). Then D° is an n x m matrix with the form of the 
matrix D in Corollary 1, and hence rank(D*') = r by Theorem 3.5. Thus 


rank(A’) = rank(D*) = r = rank(A). 


This establishes (a). 
The proofs of (b) and (c) are left as exercises. (See Exercise 13.) | 
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Corollary 3. Every invertible matrix is a product of elementary matrices. 


Proof. If A is an invertible n x n matrix, then rank(A) = n. Hence the 
matrix D in Corollary 1 equals J,,, and there exist invertible matrices B and 
C such that I, = BAC. 

As in the proof of Corollary 1, note that B = E,E,_-,---E, and C = 
G1G2---G,, where the E;’s and G;’s are elementary matrices. Thus A = 
B-'1,C-! = B“!C7}," so that 


AS BO Ey +E, G7'G, 4° G, 


The inverses of elementary matrices are elementary matrices, however, and 
hence A is the product of elementary matrices. | 


We now use Corollary 2 to relate the rank of a matrix product to the rank 
of each factor. Notice how the proof exploits the relationship between the 
rank of a matrix and the rank of a linear transformation. 


Theorem 3.7. Let T: V — W and U: W — Z be linear transformations 
on finite-dimensional vector spaces V, W, and Z, and let A and B be matrices 
such that the product AB is defined. Then 

(a) rank(UT) < rank(U). 
(b) rank(UT) < rank(T). 
(c) rank(AB) < rank(A). 
(d) rank(AB) < rank(B). 


Proof. We prove these items in the order: (a), (c), (d), and (b). 
(a) Clearly, R(T) C W. Hence 


R(UT) = UT(V) = U(T(V)) = U(R(T)) C U(W) = R(U). 
Thus 
rank(UT) = dim(R(UT)) < dim(R(U)) = rank(U). 
(c) By (a), 
rank(AB) = rank(L4g) = rank(LaLg) < rank(L,) = rank(A). 
(d) By (c) and Corollary 2 to Theorem 3.6, 
rank(AB) = rank((AB)‘) = rank(B’ A‘) < rank(B*) = rank(B). 
(b) Let a, , and + be ordered bases for V, W, and Z, respectively, and 
let A’ = [U]} and B’ = [T]%. Then A’B’ = [UT] by Theorem 2.11 (p. 88). 
Hence, by Theorem 3.3 and (d), 


rank(UT) = rank(A’B’) < rank(B’) = rank(T). | 
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It is important to be able to compute the rank of any matrix. We can 
use the corollary to Theorem 3.4, Theorems 3.5 and 3.6, and Corollary 2 to 
Theorem 3.6 to accomplish this goal. 

The object is to perform elementary row and column operations on a 
matrix to “simplify” it (so that the transformed matrix has many zero entries) 
to the point where a simple observation enables us to determine how many 
linearly independent rows or columns the matrix has, and thus to determine 
its rank. 


Example 4 
(a) Let 


1 2 1 1 
wa, G 1 -1 i) 
Note that the first and second rows of A are linearly independent since one 
is not a multiple of the other. Thus rank(A) = 2. 


(b) Let 
1-31. A 
A={1 011 
0 3 0 0 
In this case, there are several ways to proceed. Suppose that we begin with 


an elementary row operation to obtain a zero in the 2,1 position. Subtracting 
the first row from the second row, we obtain 


1 3 1 it 
0 -3 0 0 
0 3 0 0 


Now note that the third row is a multiple of the second row, and the first and 
second rows are linearly independent. Thus rank(A) = 2. 


As an alternative method, note that the first, third, and fourth columns 
of A are identical and that the first and second columns of A are linearly 
independent. Hence rank(A) = 2. 


(c) Let 
1 2 3 1 
A= |2 Ly AL 
1 -1 1 0 


Using elementary row operations, we can transform A as follows: 


2 3 1 1 2 3 1 
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It is clear that the last matrix has three linearly independent rows and hence 
has rank 3. 


In summary, perform row and column operations until the matrix is sim- 
plified enough so that the maximum number of linearly independent rows or 
columns is obvious. 


The Inverse of a Matrix 


We have remarked that an n x n matrix is invertible if and only if its rank 
is n. Since we know how to compute the rank of any matrix, we can always 
test a matrix to determine whether it is invertible. We now provide a simple 
technique for computing the inverse of a matrix that utilizes elementary row 
operations. 


Definition. Let A and B bem xn and m x p matrices, respectively. 
By the augmented matrix (A|B), we mean the m x (n+p) matrix (A B), 
that is, the matrix whose first n columns are the columns of A, and whose 
last p columns are the columns of B. 


Let A be an invertible n x n matrix, and consider the n x 2n augmented 
matrix C = (A|I,). By Exercise 15, we have 


AC =(A71 AJA) = (In|A7?). (1) 


By Corollary 3 to Theorem 3.6, A~! is the product of elementary matrices, 
say A~! = F,,E,_-1--- Ey. Thus (1) becomes 


EyEy-1°°: Ei (AlIn) = A7'C = (In|A7*). 


Because multiplying a matrix on the left by an elementary matrix transforms 
the matrix by an elementary row operation (Theorem 3.1 p. 149), we have 
the following result: If A is an invertible n x n matrix, then it is possible to 
transform the matrix (A|I,,) into the matrix (I,|A~') by means of a finite 
number of elementary row operations. 

Conversely, suppose that A is invertible and that, for some n x n matrix 
B, the matrix (A|J,) can be transformed into the matrix (I,|B) by a finite 
number of elementary row operations. Let E), Eo,...,E, be the elementary 
matrices associated with these elementary row operations as in Theorem 3.1; 
then 


EyEp—1 ++ Ei (AlIn) = Un|B). (2) 
Letting M = E,E,-1--- E,, we have from (2) that 


(MA|M) = M(AlIn) = Un|B). 
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Hence MA = I, and M = B. It follows that M = A7!. So B= A7!. Thus 
we have the following result: If A is an invertible n xn matrix, and the matrix 
(A|I,) is transformed into a matrix of the form (I;,|B) by means of a finite 
number of elementary row operations, then B = A7!. 

If, on the other hand, A is an n x n matrix that is not invertible, then 
rank(A) <n. Hence any attempt to transform (A|J,,) into a matrix of the 
form (I,|B) by means of elementary row operations must fail because oth- 
erwise A can be transformed into J, using the same row operations. This 
is impossible, however, because elementary row operations preserve rank. In 
fact, A can be transformed into a matrix with a row containing only zero 
entries, yielding the following result: If A is an n x n matrix that is not 
invertible, then any attempt to transform (A|I,) into a matrix of the form 
(I,|B) produces a row whose first n entries are zeros. 

The next two examples demonstrate these comments. 


Example 5 


We determine whether the matrix 


A= 


wnweo 
werd 


4 
2 
1 


is invertible, and if it is, we compute its inverse. 


We attempt to use elementary row operations to transform 


0 2 4/1 0 0 
(AlN = [2 4 2/0 1 0 
3.3 1/0 0 1 
into a matrix of the form (J|B). One method for accomplishing this transfor- 
mation is to change each column of A successively, beginning with the first 
column, into the corresponding column of J. Since we need a nonzero entry 
in the 1,1 position, we begin by interchanging rows 1 and 2. The result is 


2 4 2/0 1 0 
0 2 4/1 0 0 
3.3 1/0 0 1 


In order to place a 1 in the 1,1 position, we must multiply the first row by $3 
this operation yields 
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We now complete work in the first column by adding —3 times row 1 to row 
3 to obtain 


Ie SD |! Se 10 
0. 2. Al. 2102.0 
0 -3 -2}0 -2 1 


In order to change the second column of the preceding matrix into the 
second column of J, we multiply row 2 by $ to obtain a 1 in the 2,2 position. 
This operation produces 


WY: $Oe. sO 
0 1 2/5 O 0 
0 -3 -2}0 -2 1 


We now complete our work on the second column by adding —2 times row 2 
to row 1 and 3 times row 2 to row 3. The result is 


Leela a 0 
DG Bile 06 
00 4| 2 -3 1 


Only the third column remains to be changed. In order to place a 1 in the 
3,3 position, we multiply row 3 by + this operation yields 


1 0 -3|/-1 § 0 
O42 2) 2 0-0 
oo 1] 3 -§ 4 


Adding appropriate multiples of row 3 to rows 1 and 2 completes the process 
and gives 


Loo) £ -§ 3 
o10|-} 3 -4 
oo1) 3-2 4 
Thus A is invertible, and 
d,) eo 3 
8 8 4 
= 1 3 1 
Avs) "a 4 72 4 
3 _3 i 
8 8 4 
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Example 6 


We determine whether the matrix 


1 2 1 
A= {2 1 -1l 
15 4 
is invertible, and if it is, we compute its inverse. Using a strategy similar to 


the one used in Example 5, we attempt to use elementary row operations to 
transform 


1 2 1/1 0 0 
(AlD={2 1 -1/0 1 0 
15 4/0 0 1 
into a matrix of the form (J|B). We first add —2 times row 1 to row 2 and 
—1 times row 1 to row 3. We then add row 2 to row 3. The result, 
1 2 1/1 0 0 1 2 1 1 0 0 
2 1 -1/0 1 0 0 -3 -3/-2 1 0 
15 4/0 0 1 0 3 #3);-1 0 1 


oO 
w 
w 
i) 
ie 
oO 


) 
c=) 
ro) 
| 
ww 
a 
ra 


is a matrix with a row whose first 3 entries are zeros. Therefore A is not 
invertible. 


Being able to test for invertibility and compute the inverse of a matrix 
allows us, with the help of Theorem 2.18 (p. 101) and its corollaries, to test 
for invertibility and compute the inverse of a linear transformation. The next 
example demonstrates this technique. 


Example 7 


Let T: P2(R) — P2(R) be defined by T(f(ax)) = f(x) + f’(@) + f(x), where 
f’(x) and f’(x) denote the first and second derivatives of f(x). We use 
Corollary 1 of Theorem 2.18 (p. 102) to test T for invertibility and compute 
the inverse if T is invertible. Taking ( to be the standard ordered basis of 
P2(R), we have 


do 4h: £22) 
Tea tot 2 
001 
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Using the method of Examples 5 and 6, we can show that [T]¢ is invertible 
with inverse 


1 -1 0 
(Ty t=(0 1 -2 
0 0 1 


Thus T is invertible, and ([T]3)~' = [T~']s. Hence by Theorem 2.14 (p. 91), 
we have 


E27 OSs fog 
[T*(ag tayz+agx")|g= [0 1 -2 ay 


Therefore 


T~"(ao + a2 + ax”) = (ag — a1) + (a1 — 2a2)z+agz7. 


EXERCISES 


1. Label the following statements as true or false. 


(a) The rank ofa matrix is equal to the number of its nonzero columns. 

(b) The product of two matrices always has rank equal to the lesser of 
the ranks of the two matrices. 

(c) The m x n zero matrix is the only m x n matrix having rank 0. 

(d) Elementary row operations preserve rank. 

(e) Elementary column operations do not necessarily preserve rank. 

(f) The rank of a matrix is equal to the maximum number of linearly 
independent rows in the matrix. 

(g) The inverse of a matrix can be computed exclusively by means of 
elementary row operations. 

(h) The rank of an n x n matrix is at most n. 

(i) Ann xn matrix having rank n is invertible. 


2. Find the rank of the following matrices. 


1 1 0 1 1 0 
(a) {0 11 (By (2 424 (c) ¢ : 
1 1 0 1 11 
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002) Ons 
0 


Prove that for any m x n matrix A, rank(A) = 0 if and only if A is the 
zero matrix. 


rPOoOrREF 
ow Fb 
| 
OoWOW 
OrRN rR 


(g) 


FPNr © 
Wow rr 
PROF 
FPrPNF 
FPrPNMF 
ooo oO 
PrePNMF 


Use elementary row and column operations to transform each of the 
following matrices into a matrix D satisfying the conditions of Theo- 
rem 3.6, and then determine the rank of each matrix. 


i a a 
(a) [2 0 -1 2 (b) [1-2 
i oo a 21 


For each of the following matrices, compute the rank and the inverse if 
it exists. 


te 4 
@(; 3) ) (3 4) (y{1 3 4 
3 1 
0 So: A a ar ih ek 
(@y (i 1 4 (e) {-1 1 2 (f) {1 0 1 
Bi A 5 101 ie: 
1 2 1 0 tO 2 4 
a a | i: GN DO 
()}5 -3 o 3| ™le o 1 0 
a ee 0) <0. Wa 


For each of the following linear transformations T, determine whether 
T is invertible, and compute T~! if it exists. 

(a) T: P2(R) > Po(R) defined by T(f(x)) = f"(x) + 2f'(x) — f(a). 
(b) T: Po(R) — P2(R) defined by T(f(x)) = (a + 1)f’(z). 

(c) T: R® — R® defined by 


T(a1, 42,43) = (a1 + 2a2 + a3, —a1 + a2 + 2a3, a1 + a3). 
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10. 
11. 


12. 


(d) T: R? — P2(R) defined by 


T(a1, 42,43) = (a1 + a2 4 a3) t (ay a2 +t a3) + a,x. 


(e) T: Po(R) > R® defined by T(f(x)) = (f(—1), f(0), F(1))- 
(f) T: Mox2(R) — R* defined by 


T(A) = (tr(A), tr(A*), tr(E'A), tr(AE)), 
0 1 
p= (23), 


12 1 
1 01 
je ey 


where 


Express the invertible matrix 


as a product of elementary matrices. 


Let A be an m x n matrix. Prove that if c is any nonzero scalar, then 
rank(cA) = rank(A). 


Complete the proof of the corollary to Theorem 3.4 by showing that 
elementary column operations preserve rank. 


Prove Theorem 3.6 for the case that A is an m x 1 matrix. 


Let 


B' 


where B’ is an m x n submatrix of B. Prove that if rank(B) =r, then 
rank(B’) = r—1. 


Let B’ and D’ be m x n matrices, and let B and D be (m+1) x (n+1) 
matrices respectively defined by 


1 | QO --. 0 1 | (ee) 
0 0 

B= B’ and D= : D! 
0 0 


Prove that if B’ can be transformed into D’ by an elementary row 
{[column] operation, then B can be transformed into D by an elementary 
row [column] operation. 
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13. 
14. 


15. 


16. 
17. 


18. 


19. 


20. 


21. 


22. 


3.3 
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Prove (b) and (c) of Corollary 2 to Theorem 3.6. 


Let T,U: V — W be linear transformations. 


(a) Prove that R(T+U) C R(T)+R(U). (See the definition of the sum 
of subsets of a vector space on page 22.) 

(b) Prove that if W is finite-dimensional, then rank(T+U) < rank(T)+ 
rank(U). 

(c) Deduce from (b) that rank(A + B) < rank(A) + rank(B) for any 
m Xn matrices A and B. 


Suppose that A and B are matrices having n rows. Prove that 
M(A|B) = (MA|MB) for any m x n matrix M. 


Supply the details to the proof of (b) of Theorem 3.4. 


Prove that if B isa 3x 1 matrix and C isa 1x3 matrix, then the 3 x 3 
matrix BC has rank at most 1. Conversely, show that if A is any 3 x 3 
matrix having rank 1, then there exist a 3 x 1 matrix B anda 1 x 3 
matrix C such that A = BC. 


Let A be an m x n matrix and B be an n x p matrix. Prove that AB 
can be written as a sum of n matrices of rank one. 


Let A be an m x n matrix with rank m and B be an n x p matrix with 
rank n. Determine the rank of AB. Justify your answer. 


Let 


1 0 -l 2 1 
-1 1 30-1 0 
—2 1 4 -l 3 
3 -l1 -5 1 -6 
(a) Find a5 x5 matrix M with rank 2 such that AM = O, where O 
is the 4 x 5 zero matrix. 


(b) Suppose that B is a 5 x 5 matrix such that AB = O. Prove that 
rank(B) < 2. 


A= 


Let A be an m X n matrix with rank m. Prove that there exists an 
n xm matrix B such that AB = I. 


Let B be an n X m matrix with rank m. Prove that there exists an 
m Xn matrix A such that AB = I. 


SYSTEMS OF LINEAR EQUATIONS—THEORETICAL ASPECTS 


This section and the next are devoted to the study of systems of linear equa- 
tions, which arise naturally in both the physical and social sciences. In this 
section, we apply results from Chapter 2 to describe the solution sets of 
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systems of linear equations as subsets of a vector space. In Section 3.4, el- 
ementary row operations are used to provide a computational method for 
finding all solutions to such systems. 

The system of equations 


Q11%1 + A12%Q wee Ain&n = by 
a21%1 + A22%2 tet AInLn = be 
Am1L1 T+ Am2%2 ate Aamntn = bm; 


where a;; and b; (1 <i < mand 1 < j < n) are scalars in a field F and 
%1,02,...,%p are n variables taking values in F’,, is called a system of m 
linear equations in n unknowns over the field F’. 

The m x n matrix 


G11 G12 Gin 

a2] a22 a2n 
A = 

Am1 Am2 tt Amn 


is called the coefficient matrix of the system (5S). 


If we let 
XY by 
v2 bo 
a and b= ’ ; 
In bm 


then the system (.S') may be rewritten as a single matrix equation 
Ax = b. 


To exploit the results that we have developed, we often consider a system of 
linear equations as a single matrix equation. 
A solution to the system (S) is an n-tuple 


such that As = b. The set of all solutions to the system (S) is called the 
solution set of the system. System (S') is called consistent if its solution 
set is nonempty; otherwise it is called inconsistent. 
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Example 1 
(a) Consider the system 


r+22=3 
ty — t= 1. 


By use of familiar techniques, we can solve the preceding system and conclude 
that there is only one solution: 2, = 2, x2 = 1; that is, 


In matrix form, the system can be written 


(=) @)=G): 


so 


(b) Consider 


221 + 3x2 rT &3 =1 


@1— Lar 223 = 6; 
that is, 
oe As 
tole aster 
L3 
This system has many solutions, such as 
—6 8 
s= 2 and s=|-—4 
7 —3 


(c) Consider 


@,+22=0 
t+22=1; 


()6)-0) 


It is evident that this system has no solutions. Thus we see that a system of 
linear equations can have one, many, or no solutions. 


that is, 
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We must be able to recognize when a system has a solution and then be 
able to describe all its solutions. This section and the next are devoted to 
this end. 

We begin our study of systems of linear equations by examining the class 
of homogeneous systems of linear equations. Our first result (Theorem 3.8) 
shows that the set of solutions to a homogeneous system of m linear equations 
in n unknowns forms a subspace of F”. We can then apply the theory of vector 
spaces to this set of solutions. For example, a basis for the solution space can 
be found, and any solution can be expressed as a linear combination of the 
vectors in the basis. 


Definitions. A system Ax = 6 of m linear equations in n unknowns 
is said to be homogeneous if b = 0. Otherwise the system is said to be 
nonhomogeneous. 


Any homogeneous system has at least one solution, namely, the zero vec- 
tor. The next result gives further information about the set of solutions to a 
homogeneous system. 


Theorem 3.8. Let Ax = 0 be a homogeneous system of m linear equa- 
tions in n unknowns over a field F. Let K denote the set of all solutions 
to Ax = 0. Then K = N(La); hence K is a subspace of F" of dimension 
n —rank(L4) =n —rank(A). 


Proof. Clearly, K = {s € F": As = 0} = N(La). The second part now 
follows from the dimension theorem (p. 70). | 


Corollary. If m <n, the system Ax = 0 has a nonzero solution. 
Proof. Suppose that m <n. Then rank(A) = rank(L4) < m. Hence 
dim(K) = n — rank(L4) > n—-—m > 0, 


where K = N(L4). Since dim(K) > 0, K 4 {0}. Thus there exists a nonzero 
vector s € K; so s is a nonzero solution to Ar = 0. il 


Example 2 
(a) Consider the system 


1 + 2%. + 43 =0 
Ly ty — 23 =0. 


Let 
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be the coefficient matrix of this system. It is clear that rank(A) = 2. If K is 
the solution set of this system, then dim(K) = 3 — 2 = 1. Thus any nonzero 
solution constitutes a basis for K. For example, since 


1 
—2 
3 
is a solution to the given system, 
1 
—2 
3 
is a basis for K. Thus any vector in K is of the form 
1 t 
t{—2] =| -2¢], 
3 3t 


where t € R. 


(b) Consider the system x; — 2%2 + x3 = 0 of one equation in three 
unknowns. If A= (1 —2 1) is the coefficient matrix, then rank(A) = 1. 
Hence if K is the solution set, then dim(K) = 3 — 1 = 2. Note that 


2 =] 
1 and 0 
0 1 


are linearly independent vectors in K. Thus they constitute a basis for K, so 
that 


2 —1 
K=<t, | 1] +te O}:ti,t2E€R>. ¢ 
0 1 


In Section 3.4, explicit computational methods for finding a basis for the 
solution set of a homogeneous system are discussed. 

We now turn to the study of nonhomogeneous systems. Our next result 
shows that the solution set of a nonhomogeneous system Az = 6 can be 
described in terms of the solution set of the homogeneous system Az = 0. We 
refer to the equation Az = 0 as the homogeneous system corresponding 
to Ax = b. 


Theorem 3.9. Let K be the solution set of a system of linear equations 
Ax = 6, and let Ky be the solution set of the corresponding homogeneous 
system Ax = 0. Then for any solution s to Ax = b 


K=({s}+Ky={s+k: k © Ky}. 
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Proof. Let s be any solution to Ax = 6. We must show that K = {s}+Ky. 
If w € K, then Aw = b. Hence 


A(w —s) = Aw-As=b-b=0. 


So w—s € Ky. Thus there exists k € Ky such that w—s = k. It follows that 
w=s+ke{s}+kKu, and therefore 


Conversely, suppose that w € {s}+ Ky; then w = s+k for some k € Ky. 
But then Aw = A(s+k) = As+ Ak = b+ 0 = 0b; sow € K. Therefore 
{s}+ Ku C K, and thus kK = {s}+ Ku. 

Example 3 
(a) Consider the system 


1+ 2%. +273 = 7 
Ly 2 — 43 = —A4. 


The corresponding homogeneous system is the system in Example 2(a). It is 
easily verified that 


is a solution to the preceding nonhomogeneous system. So the solution set of 
the system is 


1 1 
K= 1] +t|—-2]):teRrR 
4 3 


by Theorem 3.9. 


(b) Consider the system x1 — 2a + x3 = 4. The corresponding homoge- 
neous system is the system in Example 2(b). Since 


is a solution to the given system, the solution set K can be written as 


4 2 —1 
K= O} +t, |1]4+te 0 :t1,t2ER?. ¢ 
0 0 1 
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The following theorem provides us with a means of computing solutions 
to certain systems of linear equations. 


Theorem 3.10. Let Ax = 6b be a system of n linear equations in n 
unknowns. If A is invertible, then the system has exactly one solution, namely, 
A~‘b. Conversely, if the system has exactly one solution, then A is invertible. 


Proof. Suppose that A is invertible. Substituting A~‘bd into the system, we 
have A(A~‘b) = (AA~!)b = b. Thus A~?0 is a solution. If s is an arbitrary 
solution, then As = b. Multiplying both sides by A~! gives s = A~'b. Thus 
the system has one and only one solution, namely, A~‘D. 

Conversely, suppose that the system has exactly one solution s. Let Ky 
denote the solution set for the corresponding homogeneous system Ax = 0. 
By Theorem 3.9, {s} = {s}+ Ky. But this is so only if Ky = {0}. Thus 
N(L4) = {0}, and hence A is invertible. 


Example 4 
Consider the following system of three linear equations in three unknowns: 
2x2 Ir 4x3 =2 


224 4x 223 =3 
321 322 3 = 1. 


In Example 5 of Section 3.2, we computed the inverse of the coefficient matrix 
A of this system. Thus the system has exactly one solution, namely, 


1 _5 3 nie 
Ly 8 8 4 9) 8 
vg)=Ab=|-7 7 -2]/3]=] af]. ¢ 
v3 ee: ee 8 al oe 

8 8 4 8 


We use this technique for solving systems of linear equations having in- 
vertible coefficient matrices in the application that concludes this section. 

In Example 1(c), we saw a system of linear equations that has no solutions. 
We now establish a criterion for determining when a system has solutions. 
This criterion involves the rank of the coefficient matrix of the system Ax = b 
and the rank of the matrix (Alb). The matrix (A|b) is called the augmented 
matrix of the system Az = b. 


Theorem 3.11. Let Ax = b be a system of linear equations. Then the 
system is consistent if and only if rank(A) = rank(A\b). 


Proof. To say that Az = b has a solution is equivalent to saying that 
b € R(La). (See Exercise 9.) In the proof of Theorem 3.5 (p. 153), we saw 
that 


R(La) = span({aj, d2,..-,an}), 
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the span of the columns of A. Thus Az = 6 has a solution if and only 
if b € span({a1,a2,...,@n}). But 6 € span({a1,a2,...,a,}) if and only 


if span({a@1,@2,...,@n}) = span({a1,a2,...,@n,b}). This last statement is 
equivalent to 


dim(span({a1, d2,..-,@n})) = dim(span({a1, a2,...,@n, b})). 
So by Theorem 3.5, the preceding equation reduces to 
rank(A) = rank(A\b). | 


Example 5 


Recall the system of equations 


t+22=0 


mtatz=l1 


in Example 1(c). 


A=(; i ana (41) = (5 ' i 


rank(A) = 1 and rank(A|b) = 2. Because the two ranks are unequal, the 
system has no solutions. 


Since 


Example 6 


We can use Theorem 3.11 to determine whether (3, 3, 2) is in the range of the 
linear transformation T: R? — R® defined by 


T(a1, 2,43) = (a1 + ag + a3, a1 — a2 + a3, a1 + a3). 


Now (3,3,2) € R(T) if and only if there exists a vector s = (#1, #2, 23) 
in R® such that T(s) = (3,3,2). Such a vector s must be a solution to the 
system 


t+a42+23=3 
@—%+23=3 


Ly +T 3 — 


Since the ranks of the coefficient matrix and the augmented matrix of this 
system are 2 and 3, respectively, it follows that this system has no solutions. 
Hence (3,3,2)€ R(T). 
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An Application 


In 1973, Wassily Leontief won the Nobel prize in economics for his work 
in developing a mathematical model that can be used to describe various 
economic phenomena. We close this section by applying some of the ideas we 
have studied to illustrate two special cases of his work. 

We begin by considering a simple society composed of three people 
(industries)—a farmer who grows all the food, a tailor who makes all the 
clothing, and a carpenter who builds all the housing. We assume that each 
person sells to and buys from a central pool and that everything produced is 
consumed. Since no commodities either enter or leave the system, this case 
is referred to as the closed model. 

Each of these three individuals consumes all three of the commodities pro- 
duced in the society. Suppose that the proportion of each of the commodities 
consumed by each person is given in the following table. Notice that each of 
the columns of the table must sum to 1. 


Food Clothing Housing 
Farmer 0.40 0.20 0.20 
Tailor 0.10 0.70 0.20 
Carpenter 0.50 0.10 0.60 


Let p1,p2, and ps denote the incomes of the farmer, tailor, and carpenter, 
respectively. To ensure that this society survives, we require that the con- 
sumption of each individual equals his or her income. Note that the farmer 
consumes 20% of the clothing. Because the total cost of all clothing is po, 
the tailor’s income, the amount spent by the farmer on clothing is 0.20po. 
Moreover, the amount spent by the farmer on food, clothing, and housing 
must equal the farmer’s income, and so we obtain the equation 


0.40p; + 0.20p2 + 0.20p3 = pr. 


Similar equations describing the expenditures of the tailor and carpenter pro- 
duce the following system of linear equations: 


0.40p, bas 0.20p9 Sig. 0.20p3 == Py 
0.10p1 lad 0.70p2 te 0.20p3 = 22 
0.50p1 + 0.10p2 + 0.60p3 = p3. 


This system can be written as Ap = p, where 


Pl 
P= | P2 
D3 
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and A is the coefficient matrix of the system. In this context, A is called 
the input-output (or consumption) matrix, and Ap = p is called the 
equilibrium condition. 

For vectors b = (61, b2,... ,b,) and c = (c1,C2,-.. Cn) in R”, we use the 
notation b > c [b > c] to mean b; > c; [bj > c;] for all 7. The vector b is called 
nonnegative [positive] if b > 0 [b> 0]. 

At first, it may seem reasonable to replace the equilibrium condition by 
the inequality Ap < p, that is, the requirement that consumption not exceed 
production. But, in fact, Ap < p implies that Ap = p in the closed model. 
For otherwise, there exists a k for which 


Pr > aS AgjDj- 


J 


Hence, since the columns of A sum to 1, 


Dae DT Aw To (TAs |= Do 
a a) J a J 
which is a contradiction. 
One solution to the homogeneous system (I—A)a = 0, which is equivalent 
to the equilibrium condition, is 


0.25 
p= | 0.35 
0.40 


We may interpret this to mean that the society survives if the farmer, tailor, 
and carpenter have incomes in the proportions 25 : 35: 40 (or 5: 7: 8). 

Notice that we are not simply interested in any nonzero solution to the 
system, but in one that is nonnegative. Thus we must consider the question 
of whether the system (I — A)z = 0 has a nonnegative solution, where A is a 
matrix with nonnegative entries whose columns sum to 1. A useful theorem 
in this direction (whose proof may be found in “Applications of Matrices to 
Economic Models and Social Science Relationships,” by Ben Noble, Proceed- 
ings of the Summer Conference for College Teachers on Applied Mathematics, 
1971, CUPM, Berkeley, California) is stated below. 


Theorem 3.12. Let A be ann x n input-output matrix having the form 
BC 
4=(5 5) 
where D is al x (n—1) positive vector and C is an (n—1) x 1 positive vector. 


Then (I — A)x = 0 has a one-dimensional solution set that is generated by a 
nonnegative vector. 
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Observe that any input-output matrix with all positive entries satisfies 
the hypothesis of this theorem. The following matrix does also: 


0.75 0.50 0.65 
0 0.25 0.35 
0.25 0.25 0 


In the open model, we assume that there is an outside demand for each 
of the commodities produced. Returning to our simple society, let x71, 2x2, 
and «3 be the monetary values of food, clothing, and housing produced with 
respective outside demands dj, d2, and d3. Let A be the 3 x 3 matrix such 
that A;; represents the amount (in a fixed monetary unit such as the dollar) 
of commodity 7 required to produce one monetary unit of commodity 7. Then 
the value of the surplus of food in the society is 


1 — (Ayia, + Ajor2 + A323), 


that is, the value of food produced minus the value of food consumed while 
producing the three commodities. The assumption that everything produced 
is consumed gives us a similar equilibrium condition for the open model, 
namely, that the surplus of each of the three commodities must equal the 
corresponding outside demands. Hence 


3 
aj — > Aja; = di fori = 1,2, 3. 


j=l 


In general, we must find a nonnegative solution to (I — A)a = d, where 
A is a matrix with nonnegative entries such that the sum of the entries of 
each column of A does not exceed one, and d > @. It is easy to see that if 
(I — A)~! exists and is nonnegative, then the desired solution is (I — A)~'d. 

Recall that for a real number a, the series 1 + a +a? +--+ converges to 
(1 —a)~? if |a| < 1. Similarly, it can be shown (using the concept of conver- 
gence of matrices developed in Section 5.3) that the series [+ A+ A? +--- 
converges to (I — A)~! if {A”} converges to the zero matrix. In this case, 
(I — A)~! is nonnegative since the matrices I, A, A?,... are nonnegative. 

To illustrate the open model, suppose that 30 cents worth of food, 10 
cents worth of clothing, and 30 cents worth of housing are required for the 
production of $1 worth of food. Similarly, suppose that 20 cents worth of 
food, 40 cents worth of clothing, and 20 cents worth of housing are required 
for the production of $1 of clothing. Finally, suppose that 30 cents worth of 
food, 10 cents worth of clothing, and 30 cents worth of housing are required 
for the production of $1 worth of housing. Then the input-output matrix is 


0.30 0.20 0.30 
A=|0.10 0.40 0.10 
0.30 0.20 0.30 


d 
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so 
0.70 —0.20 —0.30 2.0 1.0 1.0 
I—-A=j{-0.10 0.60 —0.10 and ([—A)"'=[05 2.0 05 
—0.30 —0.20 0.70 1.0 1.0 2.0 


Since ([— A)~* is nonnegative, we can find a (unique) nonnegative solution to 
(J — A)x = d for any demand d. For example, suppose that there are outside 
demands for $30 billion in food, $20 billion in clothing, and $10 billion in 
housing. If we set 


30 
d= | 20], 
10 
then 
90 
xz =(I—A)~'d= | 60 
70 


So a gross production of $90 billion of food, $60 billion of clothing, and $70 
billion of housing is necessary to meet the required demands. 


EXERCISES 


1. Label the following statements as true or false. 


(a) Any system of linear equations has at least one solution. 

(b) Any system of linear equations has at most one solution. 

(c) Any homogeneous system of linear equations has at least one so- 
lution. 

(d) Any system of n linear equations in n unknowns has at most one 
solution. 

(e) Any system of n linear equations in n unknowns has at least one 
solution. 

(f) If the homogeneous system corresponding to a given system of lin- 
ear equations has a solution, then the given system has a solution. 

(g) Ifthe coefficient matrix of a homogeneous system of n linear equa- 
tions in n unknowns is invertible, then the system has no nonzero 
solutions. 

(h) The solution set of any system of m linear equations in n unknowns 
is a subspace of F”. 


2. For each of the following homogeneous systems of linear equations, find 
the dimension of and a basis for the solution set. 
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21+ 329 =0 v1 +r LQ — 23 = 0 
eee oe, (Oe ae 2r3 =0 


_ 24, + r2— r3=0 
Ops, (d) 21— a+ 23=0 
‘ ae 21+ 2x2 — 2273 =0 


- x1 + 2x72 =0 
(e) x1 + 2x2 —- 323 +24 =0 (f) @1—- %=0 
v1 + 2%9 + 23+ 24 =0 

r2— 73 +2%4=0 


(g) 


Using the results of Exercise 2, find all solutions to the following sys- 
tems. 


a+ 3242 = 5 Ly +r LQ — r3=1 
(b) 4a1 + 29 — 273 =3 


22, + ar — r3=5 
+ 2x2 — 73 =3 

(c) See “2 X3 (d) fy- wot r3=1 
21+ 2x29 — 243 =4 


(e) y+ 222 = 323 +24 = 1 (f) 


4+ 2a%.+%3+2%4=1 
t—%3+%4=1 


(g) 


For each system of linear equations with the invertible coefficient matrix 
A, 
(1) Compute Am}. 
(2) Use A~! to solve the system. 
t+ 2% —243=5 
(b) Ly x2 L3 = 1 
224 = 229 +T tl3 = 4 


ay 329 =4 
4 oo on 5x2 =3 


Gig 


Give an example of a system of n linear equations in n unknowns with 
infinitely many solutions. 

Let T: R? — R? be defined by T(a,b,c) = (a + b, 2a — c). Determine 
Ta), 

Determine which of the following systems of linear equations has a so- 
lution. 
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10. 


11. 


12. 


13. 


zy r2 13 + 2%4 = 2 sede ptiad 
(a) a+ x +273 =e (b) ee a= 
221 + 2x2 v3 +2¢,=4 2a L2+ 3x3 = 2 
Xy 2x2 373 = 1 x v2 323 — 24 =0 
(c) 21+ 22- 23=0 (d) as rat @3+%4=1 
x 2x 3 =3 L1— 272+ %3-2%,=1 
: : 4x L2 823 r= 0 


v1. 222 —- @%3= 1 
(e) 221 x2 223 =3 
1 4x5 oT 7X3 =4 


Let T: R? — R® be defined by T(a,b,c) = (a+ b,b — 2c,a+ 2c). For 
each vector v in R*, determine whether v € R(T). 


(a) v= (1,3, —2) (b) v=(2,1,1) 


Prove that the system of linear equations Ax = b has a solution if and 
only if b € R(La). 


Prove or give a counterexample to the following statement: If the co- 
efficient matrix of a system of m linear equations in n unknowns has 
rank m, then the system has a solution. 


In the closed model of Leontief with food, clothing, and housing as the 
basic industries, suppose that the input-output matrix is 


i ee 
16 2 16 
5 1 5 
A=]|i6 6 16 
fe Dt 
4 3 2 


At what ratio must the farmer, tailor, and carpenter produce in order 
for equilibrium to be attained? 


A certain economy consists of two sectors: goods and services. Suppose 
that 60% of all goods and 30% of all services are used in the production 
of goods. What proportion of the total economic output is used in the 
production of goods? 


In the notation of the open model of Leontief, suppose that 


and d= (5) 


are the input-output matrix and the demand vector, respectively. How 
much of each commodity must be produced to satisfy this demand? 


A= 


wl dle 
ae oe 
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14. A certain economy consisting of the two sectors of goods and services 
supports a defense system that consumes $90 billion worth of goods and 
$20 billion worth of services from the economy but does not contribute 
to economic production. Suppose that 50 cents worth of goods and 
20 cents worth of services are required to produce $1 worth of goods 
and that 30 cents worth of of goods and 60 cents worth of services are 
required to produce $1 worth of services. What must the total output 
of the economic system be to support this defense system? 


3.4 SYSTEMS OF LINEAR EQUATIONS— 
COMPUTATIONAL ASPECTS 


In Section 3.3, we obtained a necessary and sufficient condition for a system 
of linear equations to have solutions (Theorem 3.11 p. 174) and learned how 
to express the solutions to a nonhomogeneous system in terms of solutions 
to the corresponding homogeneous system (Theorem 3.9 p. 172). The latter 
result enables us to determine all the solutions to a given system if we can 
find one solution to the given system and a basis for the solution set of the 
corresponding homogeneous system. In this section, we use elementary row 
operations to accomplish these two objectives simultaneously. The essence of 
this technique is to transform a given system of linear equations into a system 
having the same solutions, but which is easier to solve (as in Section 1.4). 


Definition. Two systems of linear equations are called equivalent if 
they have the same solution set. 


The following theorem and corollary give a useful method for obtaining 
equivalent systems. 


Theorem 3.13. Let Ax = b be a system of m linear equations in n 


unknowns, and let C' be an invertible m x m matrix. Then the system 
(CA)x = Cb is equivalent to Ax = b. 


Proof. Let K be the solution set for Az = b and K’ the solution set for 
(CA)z = Cb. If w € K, then Aw = b. So (CA)w = Cb, and hence w € K’. 
Thus K C Kk’. 

Conversely, if w € K’, then (CA)w = Cb. Hence 


An =C- (CA) =C Cb) =%:; 
so w € K. Thus K’ C K, and therefore, K = K’. | 


Corollary. Let Ax = b be a system of m linear equations in n unknowns. 
If (A’|b’) is obtained from (A|b) by a finite number of elementary row opera- 
tions, then the system A'x = b! is equivalent to the original system. 
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Proof. Suppose that (A’|b’) is obtained from (A|b) by elementary row 
operations. These may be executed by multiplying (A|b) by elementary mx m 
matrices fy, H2,..., Hp. Let C = E,--+ Bok); then 


(A’|b’) = C(Alb) = (CACO). 


Since each F; is invertible, so is C. Now A’ = CA and b’ = Cb. Thus by 
Theorem 3.13, the system A’x = b’ is equivalent to the system Ax = b. | 


We now describe a method for solving any system of linear equations. 
Consider, for example, the system of linear equations 


321 + 2% + 3x3 — 244 =1 
XY X2 x3 =3 


1+ 2x2 x3 v4 = 2. 


First, we form the augmented matrix 


3.2 3 -2)1 
1 11 0} 3 
Lo2 1. =-1/2 


By using elementary row operations, we transform the augmented matrix 
into an upper triangular matrix in which the first nonzero entry of each row 
is 1, and it occurs in a column to the right of the first nonzero entry of each 
preceding row. (Recall that matrix A is upper triangular if A;; = 0 whenever 
i>j.) 
1. In the leftmost nonzero column, create a 1 in the first row. In our 
example, we can accomplish this step by interchanging the first and 
third rows. The resulting matrix is 


128 Do 2 
oe 0/3 
3.2 3 -2)1 


2. By means of type 8 row operations, use the first row to obtain zeros in 
the remaining positions of the leftmost nonzero column. In our example, 
we must add —1 times the first row to the second row and then add —3 
times the first row to the third row to obtain 


1 Di Ag ea, 2 
0 -1 0 1 1 
0 -—4 0 1 | —5 


3. Create a 1 in the next row in the leftmost possible column, without using 
previous row(s). In our example, the second column is the leftmost 
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possible column, and we can obtain a 1 in the second row, second column 
by multiplying the second row by —1. This operation produces 


1 28 lee 2 
0 1 0 -1)-1 
0 —4 0 1 | —5 


4. Now use type 3 elementary row operations to obtain zeros below the 1 
created in the preceding step. In our example, we must add four times 
the second row to the third row. The resulting matrix is 


12. 1 2 
O 1 0 -1} -1 
0 0 0 -—3) -9 


5. Repeat steps 3 and 4 on each succeeding row until no nonzero rows 
remain. (This creates zeros above the first nonzero entry in each row.) 
In our example, this can be accomplished by multiplying the third row 
by —;. This operation produces 


1 2 1 -l 2 
0 1 0 -1}-1 
0 0 0 1 3 


We have now obtained the desired matrix. To complete the simplification 
of the augmented matrix, we must make the first nonzero entry in each row 
the only nonzero entry in its column. (This corresponds to eliminating certain 
unknowns from all but one of the equations.) 


6. Work upward, beginning with the last nonzero row, and add multiples of 
each row to the rows above. (This creates zeros above the first nonzero 
entry in each row.) In our example, the third row is the last nonzero 
row, and the first nonzero entry of this row lies in column 4. Hence we 
add the third row to the first and second rows to obtain zeros in row 1, 
column 4 and row 2, column 4. The resulting matrix is 


1 2 1 0/5 
0 1 0 0) 2 
00 0 1/3 


7. Repeat the process described in step 6 for each preceding row until it is 
performed with the second row, at which time the reduction process is 
complete. In our example, we must add —2 times the second row to the 
first row in order to make the first row, second column entry become 
zero. This operation produces 


1 0 1 O07} 1 
0 1 0 0) 2 
00 0 1/3 
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We have now obtained the desired reduction of the augmented matrix. 
This matrix corresponds to the system of linear equations 


y+ x3 =~) 


Recall that, by the corollary to Theorem 3.13, this system is equivalent to 
the original system. But this system is easily solved. Obviously x2 = 2 and 
x4 = 3. Moreover, x; and x3 can have any values provided their sum is 1. 
Letting x3 = t, we then have x; = 1 —t. Thus an arbitrary solution to the 
original system has the form 


1-¢ 


Wonr 


2 
t 
3 


Observe that 


is a basis for the homogeneous system of equations corresponding to the given 
system. 

In the preceding example we performed elementary row operations on the 
augmented matrix of the system until we obtained the augmented matrix of a 
system having properties 1, 2, and 3 on page 27. Such a matrix has a special 
name. 


Definition. A matrix is said to be in reduced row echelon form if 
the following three conditions are satisfied. 

(a) Any row containing a nonzero entry precedes any row in which all the 
entries are zero (if any). 

(b) The first nonzero entry in each row is the only nonzero entry in its 
column. 

(c) The first nonzero entry in each row is 1 and it occurs in a column to 
the right of the first nonzero entry in the preceding row. 


Example 1 


(a) The matrix on page 184 is in reduced row echelon form. Note that the 
first nonzero entry of each row is 1 and that the column containing each such 
entry has all zeros otherwise. Also note that each time we move downward to 
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a new row, we must move to the right one or more columns to find the first 
nonzero entry of the new row. 


(b) The matrix 
1 1 0 
0 1 0}, 
1 0 1 


is not in reduced row echelon form, because the first column, which contains 
the first nonzero entry in row 1, contains another nonzero entry. Similarly, 
the matrix 


Go the “On 
1001), 
0011 


is not in reduced row echelon form, because the first nonzero entry of the 
second row is not to the right of the first nonzero entry of the first row. 


Finally, the matrix 
2 0 0 
0 1 OF’ 


is not in reduced row echelon form, because the first nonzero entry of the first 
rowisnotl. 


It can be shown (see the corollary to Theorem 3.16) that the reduced 
row echelon form of a matrix is unique; that is, if different sequences of 
elementary row operations are used to transform a matrix into matrices Q 
and Q’ in reduced row echelon form, then Q = Q’. Thus, although there are 
many different sequences of elementary row operations that can be used to 
transform a given matrix into reduced row echelon form, they all produce the 
same result. 

The procedure described on pages 183-185 for reducing an augmented 
matrix to reduced row echelon form is called Gaussian elimination. It 
consists of two separate parts. 


1. In the forward pass (steps 1-5), the augmented matrix is transformed 
into an upper triangular matrix in which the first nonzero entry of each 
row is 1, and it occurs in a column to the right of the first nonzero entry 
of each preceding row. 

2. In the backward pass or back-substitution (steps 6-7), the upper trian- 
gular matrix is transformed into reduced row echelon form by making 
the first nonzero entry of each row the only nonzero entry of its column. 


Sec. 3.4 Systems of Linear Equations—Computational Aspects 187 


Of all the methods for transforming a matrix into its reduced row ech- 
elon form, Gaussian elimination requires the fewest arithmetic operations. 
(For large matrices, it requires approximately 50% fewer operations than the 
Gauss-Jordan method, in which the matrix is transformed into reduced row 
echelon form by using the first nonzero entry in each row to make zero all 
other entries in its column.) Because of this efficiency, Gaussian elimination 
is the preferred method when solving systems of linear equations on a com- 
puter. In this context, the Gaussian elimination procedure is usually modified 
in order to minimize roundoff errors. Since discussion of these techniques is 
inappropriate here, readers who are interested in such matters are referred to 
books on numerical analysis. 

When a matrix is in reduced row echelon form, the corresponding sys- 
tem of linear equations is easy to solve. We present below a procedure for 
solving any system of linear equations for which the augmented matrix is in 
reduced row echelon form. First, however, we note that every matrix can be 
transformed into reduced row echelon form by Gaussian elimination. In the 
forward pass, we satisfy conditions (a) and (c) in the definition of reduced 
row echelon form and thereby make zero all entries below the first nonzero 
entry in each row. Then in the backward pass, we make zero all entries above 
the first nonzero entry in each row, thereby satisfying condition (b) in the 
definition of reduced row echelon form. 


Theorem 3.14. Gaussian elimination transforms any matrix into its re- 
duced row echelon form. 


We now describe a method for solving a system in which the augmented 
matrix is in reduced row echelon form. To illustrate this procedure, we con- 
sider the system 


221 322 X3 4x4 9x5, =17 
Ly v2 L347 LA — 325 = 6 
Ly v2 X3 224 525 = 8 

221 2x2 2x3 324 8x5 = 14, 


for which the augmented matrix is 


2 3 1 4 -9)17 
1 1 1 1 -3/ 6 
1d ok D2? 55) 8 


2 2 2 3 -8| 14 


Applying Gaussian elimination to the augmented matrix of the system pro- 
duces the following sequence of matrices. 


Org ae, 93/47 i as Sa 
fey. BG Be eA 0" 7 
be Bie Se ee | 
92 2 3 -8/ 14 Pie ae ae: ae 
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pa 1 -3|6 Loe ft Wy ae 
Gras aah os Si 5 Oa Sk Da aG 
0 0 1, S22) | OAs ithe he. 7 
Oe Or. 70% 1x29: | 9 Oe 800 © 0 0h. S61) 6 
11 Ot peat GG) “Doi 9/3 
O 1-10 1]1]_ fo 1-10 141 
0 0 C289 00 06 Ft -2/2 
Oo AOLI0: ~ 36. iag 00 00 O]0 


The system of linear equations corresponding to this last matrix is 


Ly + 2x3 — 245 =3 
2 + ts =1 
t4 — 245 = 2. 


Notice that we have ignored the last row since it consists entirely of zeros. 
To solve a system for which the augmented matrix is in reduced row 
echelon form, divide the variables into two sets. The first set consists of 
those variables that appear as leftmost variables in one of the equations of 
the system (in this case the set is {%1,22,x%4}). The second set consists of 
all the remaining variables (in this case, {%3,x75}). To each variable in the 
second set, assign a parametric value t1,t2,... (v3 = t1, 5 = tg), and then 
solve for the variables of the first set in terms of those in the second set: 


vy = 223 t 225 3= 2t1 t 2to 
ty = 23 — t+1= t—- to +1 
w4 = 225 +2. = 2to +2. 


Thus an arbitrary solution is of the form 


Ly -—2t) + 2to +3 3 —2 2 
iD) ty = tg +1 1 1 -1 
X3 = ty = 0 + ty 1 + to 0 5 
v4 2to +2 2 0 2 
x5 to 0 0 1 


on le \) 
rPNOF Wb 
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is a basis for the solution set of the corresponding homogeneous system of 
equations and 


ON OF WwW 


is a particular solution to the original system. 

Therefore, in simplifying the augmented matrix of the system to reduced 
row echelon form, we are in effect simultaneously finding a particular solu- 
tion to the original system and a basis for the solution set of the associated 
homogeneous system. Moreover, this procedure detects when a system is in- 
consistent, for by Exercise 3, solutions exist if and only if, in the reduction of 
the augmented matrix to reduced row echelon form, we do not obtain a row 
in which the only nonzero entry lies in the last column. 

Thus to use this procedure for solving a system Ax = 6 of m linear equa- 
tions in n unknowns, we need only begin to transform the augmented matrix 
(Ab) into its reduced row echelon form (A’|b’) by means of Gaussian elimi- 
nation. If a row is obtained in which the only nonzero entry lies in the last 
column, then the original system is inconsistent. Otherwise, discard any zero 
rows from (A’|b’), and write the corresponding system of equations. Solve 
this system as described above to obtain an arbitrary solution of the form 


S=Sottyuy + teue+-:: +tp—pUn—r, 


where r is the number of nonzero rows in A’ (r < m). The preceding equation 
is called a general solution of the system Az = b. It expresses an arbitrary 
solution s of Ax = b in terms of n — r parameters. The following theorem 
states that s cannot be expressed in fewer than n — r parameters. 


Theorem 3.15. Let Ax = b be a system of r nonzero equations in n 
unknowns. Suppose that rank(A) = rank(A|b) and that (A|b) is in reduced 
row echelon form. Then 

(a) rank(A) =r. 
(b) If the general solution obtained by the procedure above is of the form 


S=Sottyuy + toa t-:-+ty_-Un—r, 


then {u1,U2,...,Un—r} is a basis for the solution set of the correspond- 
ing homogeneous system, and so is a solution to the original system. 


Proof. Since (Alb) is in reduced row echelon form, (Alb) must have r 
nonzero rows. Clearly these rows are linearly independent by the definition 
of the reduced row echelon form, and so rank(A|b) = r. Thus rank(A) = r. 


190 Chap. 3. Elementary Matrix Operations and Systems of Linear Equations 


Let K be the solution set for Ax = b, and let Ky be the solution set for 
Ax = 0. Setting ty = tg =--- = tn_, = 0, we see that s = s9 € K. But by 
Theorem 3.9 (p. 172), K = {so} + Ky. Hence 


Ky = {-—so}+ K = span({uz, ue,..-,Un—r}). 


Because rank(A) = r, we have dim(Ky) = n—r. Thus since dim(Ky) = n—r 
and Ky is generated by a set {u1,u2,...,Un—r} containing at most n —r 
vectors, we conclude that this set is a basis for Ky. ie 


An Interpretation of the Reduced Row Echelon Form 


Let A be an m X n matrix with columns aj, qa2,...,@,, and let B be the 
reduced row echelon form of A. Denote the columns of B by 61, bz,...,bn. If 
the rank of A is r, then the rank of B is also r by the corollary to Theorem 3.4 
(p. 153). Because B is in reduced row echelon form, no nonzero row of B can 
be a linear combination of the other rows of B. Hence B must have exactly 
r nonzero rows, and if r > 1, the vectors €1,e2,...,e, must occur among the 
columns of B. For i = 1,2,...,r, let 7; denote a column number of B such 
that b;, = e;. We claim that a;,,a;,,...,a;,, the columns of A corresponding 
to these columns of B, are linearly independent. For suppose that there are 
scalars c,,C2,...,Cp such that 


C145, + C2Aj. + +++ + Cpa;, = 0. 


Because B can be obtained from A by a sequence of elementary row oper- 
ations, there exists (as in the proof of the corollary to Theorem 3.13) an 
invertible m x m matrix M such that MA = B. Multiplying the preceding 
equation by M yields 


c1Ma;, + coMaj;, +---+¢,Ma;, = 0. 
Since Ma,, = b;, = e;, it follows that 


Cyé€y + C9€2 +++ + Créer = 0. 


Hence cy = cp = -:: = cy = 0, proving that the vectors aj,,aj;,,...,@;, are 
linearly independent. 
Because B has only r nonzero rows, every column of B has the form 


dy 
dy 
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for scalars d,,d,,...,d,. The corresponding column of A must be 


M71 (dye, + dgeg +++++ d,-€r) = d,M~'e,; + dy>M~'e» a ae d,M~'e, 
= d)M~1b,, + dgM71b;, +--+» +dpM71b,, 
= di aj, + d2a55 fteeet dy}, 


The next theorem summarizes these results. 


Theorem 3.16. Let A be an m x n matrix of rank r, where r > 0, and 
let B be the reduced row echelon form of A. Then 
(a) The number of nonzero rows in B is r. 
(b) For each i = 1,2,...,7, there is a column b;, of B such that b;, = e;. 
(c) The columns of A numbered j,, j2,...,j, are linearly independent. 
(d) For each k = 1,2,...n, if column k of B is dye; + dge2+---+d,e,, then 
column k of A is dja,;, + dga;, +++++ d,ayj,. 


Corollary. The reduced row echelon form of a matrix is unique. 


Proof. Exercise. (See Exercise15.) | 
Example 2 
Let 
2462 4 
123 1 1 
a 2 4 8 0 0 
3.6 7 5 9 


120 40 
GG oe sr 
Ble tdo. 0-4 
000 0 0 


Since B has three nonzero rows, the rank of A is 3. The first, third, and fifth 
columns of B are e;,e2, and e3; so Theorem 3.16(c) asserts that the first, 
third, and fifth columns of A are linearly independent. 


Let the columns of A be denoted ay, a2, a3, a4, and as. Because the second 
column of B is 2e,, it follows from Theorem 3.16(d) that az = 2a1, as is easily 
checked. Moreover, since the fourth column of B is 4e; + (—1)e2, the same 
result shows that 


a4 =4a,+(-l)as. 
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In Example 6 of Section 1.6, we extracted a basis for R* from the gener- 
ating set 


S = {(2, —3,5), (8, —12, 20), (1,0, —2), (0,2, —1), (7, 2,0)}. 


The procedure described there can be streamlined by using Theorem 3.16. 
We begin by noting that if S were linearly independent, then S would be a 
basis for R°. In this case, it is clear that S$ is linearly dependent because 
S contains more than dim(R*) = 3 vectors. Nevertheless, it is instructive 
to consider the calculation that is needed to determine whether S' is linearly 
dependent or linearly independent. Recall that S is linearly dependent if 
there are scalars ¢,c¢2,¢3, C4, and cs, not all zero, such that 


€1 (2, —3, 5)-+e2(8, —12, 20)+¢3(1,0, —2) +c4(0, 2, -1)+¢5(7, 2, 0) = (0,0, 0). 


Thus S is linearly dependent if and only if the system of linear equations 


2c1 + 8c9 + C3 fs 7c5 = 
—3c, — 12co + 2c4 + 2c5 = 0 
DC Tr 20c2 ae 2c3 — C4 = 


has a nonzero solution. The augmented matrix of this system of equations is 


2 8 1 0 7 0 
A={-3 -12 0 2 2 O], 
5 20 -—2 -1 0 0 


and its reduced row echelon form is 


1 4 0 0 2 0 
B={0 01 0 3 0 
000 1 4 0 


Using the technique described earlier in this section, we can find nonzero 
solutions of the preceding system, confirming that S is linearly dependent. 
However, Theorem 3.16(c) gives us additional information. Since the first, 
third, and fourth columns of B are e1,e2, and e3, we conclude that the first, 
third, and fourth columns of A are linearly independent. But the columns 
of A other than the last column (which is the zero vector) are vectors in S. 
Hence 


B= {(2, -3, 5), (1,0, —2), (0,2, -1)} 


is a linearly independent subset of S. If follows from (b) of Corollary 2 to the 
replacement theorem (p. 47) that 3 is a basis for R®. 

Because every finite-dimensional vector space over F is isomorphic to F” 
for some n, a similar approach can be used to reduce any finite generating 
set to a basis. This technique is illustrated in the next example. 
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Example 3 
The set 


S= 4042274323 442744974 6r7 643248274 Tr? 24+-a¢+523 4424927} 


generates a subspace V of P3(R). To find a subset of S that is a basis for V, 
we consider the subset 


sis {(2, 1,2,3), (4, 2,4,6), (6,3, 8,7), (2, 1,0;5); (4, 1,0,9)} 


consisting of the images of the polynomials in $ under the standard repre- 
sentation of P3(2) with respect to the standard ordered basis. Note that the 
4 x 5 matrix in which the columns are the vectors in S$” is the matrix A in 
Example 2. From the reduced row echelon form of A, which is the matrix B 
in Example 2, we see that the first, third, and fifth columns of A are linearly 
independent and the second and fourth columns of A are linear combinations 
of the first, third, and fifth columns. Hence 


{(2, 1, 2, 3), (6, 3, 8, i: (4, 1, 0, 9)} 
is a basis for the subspace of R* that is generated by S’. It follows that 


{2.9 + Qn? + 327,64 32+ 8a" + 727,44 ¢+ 92°} 


is a basis for the subspace V of P3(R). 


We conclude this section by describing a method for extending a linearly 
independent subset S$ of a finite-dimensional vector space V to a basis for V. 
Recall that this is always possible by (c) of Corollary 2 to the replacement 
theorem (p. 47). Our approach is based on the replacement theorem and 
assumes that we can find an explicit basis 6 for V. Let S’ be the ordered set 
consisting of the vectors in S' followed by those in G. Since 6 C S’, the set 
S’ generates V. We can then apply the technique described above to reduce 
this generating set to a basis for V containing S. 


Example 4 
Let 


V= {(@1, £2, £3, £4, £5) cS Res a, t+ 7xrq + 5x3 — 4044+ 245 = O}. 
It is easily verified that V is a subspace of R° and that 
S = {(—2,0,0, —1, -1), (1,1, —2, -1, -1), (—5,1,0,1,1)} 


is a linearly independent subset of V. 


194 Chap. 3. Elementary Matrix Operations and Systems of Linear Equations 


To extend S$ to a basis for V, we first obtain a basis @ for V. To do so, 
we solve the system of linear equations that defines V. Since in this case V is 
defined by a single equation, we need only write the equation as 


a = —Txq — 5243 + 4x4 — 225 


and assign parametric values to ro, v3, @4, and a5. If rg = ty, x3 = to, 
v4 = tgs, and x5 = ty, then the vectors in V have the form 


(@1,@2, U3, 4,05) = (—Tt1 — Ste + 4tg — 2t4, ty, to, ts, ta) 
= t(—7, 1,0,0,0) + t2(—5, 0,1, 0,0) + t3(4,0,0, 1,0) + t4(—2,0,0,0, 1). 
Hence 
8 = {(-7,1,0,0,0), (—5, 0,1, 0,0), (4,0, 0, 1,0), (—2,0,0,0, 1)} 
is a basis for V by Theorem 3.15. 


The matrix whose columns consist of the vectors in S followed by those 
in ( is 


—2 1 5 t 5 4 2 
0 1 1 1 0 O 0 
0 -2 0 0 1 O 0], 

—1 -l 1 0 0 1 0 

—1 -l 1 0 0 O 1 

and its reduced row echelon form is 
1001 1 0 -1 
0 10 0 —5 O 0 
001 1 5 0 0 
0 0 0 0 01 -l 
0 0 0 0 0 O 0 


Thus 
{(—2,0,0, -i, =k) (1, 1, —2, -1, 1) (—5, 1, 0, 1, 1) (4,0,0, 1,0)} 


is a basis for V containing S. 


EXERCISES 


1. Label the following statements as true or false. 


(a) If (A’|b’) is obtained from (A|b) by a finite sequence of elementary 
column operations, then the systems Av = b and A’xr = b’ are 
equivalent. 
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(b) 


(c) 
(d) 
(e) 
(f) 


(g) 


If (A’|b’) is obtained from (A|b) by a finite sequence of elemen- 
tary row operations, then the systems Ax = b and A’xr = Db’ are 
equivalent. 

If Ais ann x n matrix with rank n, then the reduced row echelon 
form of A is In. 

Any matrix can be put in reduced row echelon form by means of 
a finite sequence of elementary row operations. 

If (A|b) is in reduced row echelon form, then the system Az = 6 is 
consistent. 

Let Ax = b be a system of m linear equations in n unknowns for 
which the augmented matrix is in reduced row echelon form. If 
this system is consistent, then the dimension of the solution set of 
Ax = 0 isn—r, where r equals the number of nonzero rows in A. 
If a matrix A is transformed by elementary row operations into a 
matrix A’ in reduced row echelon form, then the number of nonzero 
rows in A’ equals the rank of A. 


2. Use Gaussian elimination to solve the following systems of linear equa- 
tions. 


(a) 


(c) 


(d) 


(e) 


(g) 


(hh) 


ry 2X9 v3 = 1 Fie 7 ae - 8 = ; 
Qa, +2%2.+ 23= 1 (5) ee 
3a, + Sag — 203 =-1 Sn Oe =i 
Ly om 523 =9 
t+ 222 +r 224 = 6 
321 5x2 X3 624 =17 
224 429 t x3 204 = 12 
221 = 7x3 a lla, — A 
L1—- ®La—- 223 ia 324 =-—7 
2241 a9 6x3 6x4 =-2 
2%, + 2 — 423 3274 = O 
321 — 242 + 973 + 1024 = —5 
a, —4%—- 423+ = 3 v1, + 2a — 2734+ 344 = 2 
24, —8%0+ @3—-4¢%4= 9 (f) 22, + 402 —23+ 624 =5 
a, + 4x2 — 243 + 524 = —6 x2 + 274 = 3 
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321 X2 2x3 4x4 ws = 2: 

A XY iD) 2x3 3X4 ws = 1 
OY) in eh AO gy ede ay 
7x1 2x2 4x3 824 ws = 6 


221 ae 323 _ Ans = 5 

% 321 4x5 8x3 3X4 = 8 
Gi) Ly v2 2x3 v4 ws = 2 
22%, + 5x2 — 9x3 — 3x4 — 525 = —8 


Suppose that the augmented matrix of a system Ax = 6 is transformed 
into a matrix (A’|b’) in reduced row echelon form by a finite sequence 


of elementary row operations. 


(a) Prove that rank(A’) 4 rank(A’|b’) if and only if (A’|b’) contains a 
row in which the only nonzero entry lies in the last column. 

(b) Deduce that Ax = 6 is consistent if and only if (A’|b’) contains no 
row in which the only nonzero entry lies in the last column. 


For each of the systems that follow, apply Exercise 3 to determine 
whether the system is consistent. If the system is consistent, find all 
solutions. Finally, find a basis for the solution set of the corresponding 


homogeneous system. 


21+ 2x2 r3 w= 2 t+ 22 — 3%3 + 44 = —2 
(a) 221 + Zot 43- = 3 (b) U+%2+ %3-U4= 2 
Ly 2x2 323 224 = 2, L112 2X = 0 


t+22 —34%3+2%4=1 
(c) a +a42.+ v3 —- 24 =2 
L117 t%2— X83 =0 


Let the reduced row echelon form of A be 
1 0 2 0 —-2 
0 1 -5 0 -3 
0 0 0 1 6 


Determine A if the first, second, and fourth columns of A are 


1 0 1 
=<ale jaale and  [=2 15 
3 1 0 


respectively. 


Let the reduced row echelon form of A be 
1 -3 0 4 0 5 


0 O01 3 0 2 
0 0 0 0 1 -!1 
0 00 0 0 0 
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Determine A if the first, third, and sixth columns of A are 


1 —1 3 
2 1 —9 
aie | 9 |> and 9 | > 

3 —4 5 


respectively. 


7. It can be shown that the vectors u1 = (2,—3,1), ue = (1,4, —2), us = 
(—8, 12, —4), ws = (1,37, -17), and us = (—3, —5,8) generate R®. Find 
a subset of {u1, U2, U3, U4, Us} that is a basis for R®. 


8. Let W denote the subspace of R® consisting of all vectors having coor- 
dinates that sum to zero. The vectors 


uy = (2,—3, 4, —5, 2), oe =12,15;6); 

ig = (3,-3.7, 0), = (2,- a 2, 2,6), 

us = (—1,1,2,1,—3), ug = (0, —3, —18, 9, 12), 
= (1,0,—2,3,-2), and = (0, 21s, 927) 


generate W. Find a subset of {u1, u2,...,ug} that is a basis for W. 


9. Let W be the subspace of Mox2(R) consisting of the symmetric 2 x 2 
matrices. The set 


ies 0 -1 1 2 2 1 1 -2 -1 2 
~ | \=1 TJ?\2 38/°\1 97? \-2 0 477K 2 1 
generates W. Find a subset of S that is a basis for W. 


10. Let 


V = {(#1, £2, 13,24,%5) € R°: x1 — 2x9 + 323 — v4 + 2a5 = OF. 
(a) val that S = {(0,1,1,1,0)} is a linearly independent subset of 


(b) ae S to a basis for V. 


11. Let V be as in Exercise 10. 
(a) ee that S = {(1,2,1,0,0)} is a linearly independent subset of 


(b) ee S to a basis for V. 


12. Let V denote the set of all solutions to the system of linear equations 


t— £9 2x4 — 3X5 r= 0 
221 XQ X3 3X4 4x5 4x6 = 0. 
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13. 


14. 


15. 
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(a) Show that S = {(0,—1,0,1,1,0), (1,0,1,1,1,0)} is a linearly inde- 


pendent subset of V. 
Extend S to a basis for V. 


(b) 


Let V be as in Exercise 12. 


(a) Show that S = {(1,0,1,1,1,0), (0,2,1,1,0,0)} is a linearly inde- 


pendent subset of V. 
(b) Extend S to a basis for V. 


row echelon form. 


a matrix is unique. 


If (A|b) is in reduced row echelon form, prove that A is also in reduced 


Prove the corollary to Theorem 3.16: The reduced row echelon form of 
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Determinants 


4.1 Determinants of Order 2 

4.2 Determinants of Order n 

4.3. Properties of Determinants 

4.4 Summary — Important Facts about Determinants 
4.5* A Characterization of the Determinant 


The determinant, which has played a prominent role in the theory of lin- 
ear algebra, is a special scalar-valued function defined on the set of square 
matrices. Although it still has a place in the study of linear algebra and its 
applications, its role is less central than in former times. Yet no linear algebra 
book would be complete without a systematic treatment of the determinant, 
and we present one here. However, the main use of determinants in this book 
is to compute and establish the properties of eigenvalues, which we discuss in 
Chapter 5. 

Although the determinant is not a linear transformation on Mn xn(F) 
for n > 1, it does possess a kind of linearity (called n-linearity) as well 
as other properties that are examined in this chapter. In Section 4.1, we 
consider the determinant on the set of 2 x 2 matrices and derive its important 
properties and develop an efficient computational procedure. To illustrate the 
important role that determinants play in geometry, we also include optional 
material that explores the applications of the determinant to the study of 
area and orientation. In Sections 4.2 and 4.3, we extend the definition of the 
determinant to all square matrices and derive its important properties and 
develop an efficient computational procedure. For the reader who prefers to 
treat determinants lightly, Section 4.4 contains the essential properties that 
are needed in later chapters. Finally, Section 4.5, which is optional, offers 
an axiomatic approach to determinants by showing how to characterize the 
determinant in terms of three key properties. 


4.1. DETERMINANTS OF ORDER 2 


In this section, we define the determinant of a 2 x 2 matrix and investigate 
its geometric significance in terms of area and orientation. 
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a b 
a= (0g) 
is a2 x2 matrix with entries from a field F, then we define the determinant 
of A, denoted det(A) or |A|, to be the scalar ad — be. 


Definition. If 


Example 1 


For the matrices 


in Mox2(R), we have 


det(A) = 1-4—2-3 = —2 and det(B) =3-4-2-6=0. @ 


For the matrices A and B in Example 1, we have 


and so 
det(A + B) = 4-8—4-9 = —4. 


Since det(A + B) # det(A) + det(B), the function det: Moy2(R) — R is 
not a linear transformation. Nevertheless, the determinant does possess an 
important linearity property, which is explained in the following theorem. 


Theorem 4.1. The function det: Mgx2(F) — F is a linear function of 
each row of a 2 x 2 matrix when the other row is held fixed. That is, if u, v, 
and w are in F? and k is a scalar, then 


det ie a = det (“) + kdet () 
w w w 
w w w 

det &: 3) = det + kdet (*) ; 


Proof. Let u = (a1, a2), v = (bi, b2), and w = (c1,¢2) be in F? and k bea 
scalar. Then 


det (‘') + kdet () — det é 2) + kdet & . 
WwW w Cy © Cy C2 


and 
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= (a C2 — a2C,) + k(byc2 — bec) 
= (a1 + kby)cg — (ag + kde) cy 


— det (@ + kb, ag +r a) 


Cy c2 


= det es ey F 
w 


A similar calculation shows that 


w w w 
det (") + kdet @ = det Ge i) : i 


For the 2 x 2 matrices A and B in Example 1, it is easily checked that A 
is invertible but B is not. Note that det(A) 4 0 but det(B) = 0. We now 
show that this property is true in general. 


Theorem 4.2. Let A € Mox2(F). Then the determinant of A is nonzero 
if and only if A is invertible. Moreover, if A is invertible, then 


Yous’ 1 Azz —Aie 
et(A) \-Aa = Aun) 
Proof. If det(A) 4 0, then we can define a matrix 
1 Aon —Als 
M = —— , 
det(A) ae Au 
A straightforward calculation shows that AM = MA = J, and so A is invert- 


ible and M = A7!. 
Conversely, suppose that A is invertible. A remark on page 152 shows 


that the rank of 
Ay. Aj 
A — 
& -) 


must be 2. Hence Au x 0 or Ao x 0. If Au x 0, add —Ao)/Air times row 1 
of A to row 2 to obtain the matrix 


Ay, Ajo 
Aj.Aa1 
A 
0 22 Ag 


Because elementary row operations are rank-preserving by the corollary to 
Theorem 3.4 (p. 153), it follows that 


_ AAai 
A 


11 


Ax £0. 
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Therefore det (A) = Ay; Ag2 = Ayo Ao x 0. On the other hand, if Ao # 0, 
we see that det(A) 4 0 by adding —Aj;/Ao21 times row 2 of A to row 1 and 
applying a similar argument. Thus, in either case, det(A) 4 0. | 


In Sections 4.2 and 4.3, we extend the definition of the determinant to 
n X n matrices and show that Theorem 4.2 remains true in this more general 
context. In the remainder of this section, which can be omitted if desired, 
we explore the geometric significance of the determinant of a 2 x 2 matrix. 
In particular, we show the importance of the sign of the determinant in the 
study of orientation. 


The Area of a Parallelogram 


By the angle between two vectors in R?, we mean the angle with measure 
6 (0 <@< 7) that is formed by the vectors having the same magnitude and 
direction as the given vectors but emanating from the origin. (See Figure 4.1.) 


\7 NY. 


Figure 4.1: Angle between two vectors in R? 


If 3 = {u,v} is an ordered basis for R?, we define the orientation of 3 
to be the real number 


Oy 


(The denominator of this fraction is nonzero by Theorem 4.2.) Clearly 


O (*) agit 
Vv 

O (<1) =1 and o( )) =A 4, 
€2 —e2 


Recall that a coordinate system {u,v} is called right-handed if u can 
be rotated in a counterclockwise direction through an angle 6 (0 < 0 < 7m) 


Notice that 
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to coincide with v. Otherwise {u,v} is called a left-handed system. (See 
Figure 4.2.) In general (see Exercise 12), 


y y 


A right-handed coordinate system A left-handed coordinate system 


Figure 4.2 


o(%)=1 


if and only if the ordered basis {u, v} forms a right-handed coordinate system. 
For convenience, we also define 


if {u, v} is linearly dependent. 

Any ordered set {u,v} in R? determines a parallelogram in the following 
manner. Regarding u and v as arrows emanating from the origin of R?, we 
call the parallelogram having u and v as adjacent sides the parallelogram 
determined by u and v. (See Figure 4.3.) Observe that if the set {u,v} 


yt 


Figure 4.3: Parallelograms determined by u and v 


is linearly dependent (i.e., if w and v are parallel), then the “parallelogram” 
determined by wu and v is actually a line segment, which we consider to be a 
degenerate parallelogram having area zero. 
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There is an interesting relationship between 


the area of the parallelogram determined by u and v, and 


det @ ; 
v 


which we now investigate. Observe first, however, that since 


det (*) 


may be negative, we cannot expect that 


a()-a(0) 


But we can prove that 


from which it follows that 


Our argument that 


a()=o(s) (9) 


employs a technique that, although somewhat indirect, can be generalized to 
R”. First, since 


e) 
ph 
e -< 
VS 
II 
Lt 
TT 
om 


we may multiply both sides of the desired equation by 


(:) 


to obtain the equivalent form 
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We establish this equation by verifying that the three conditions of Exercise 11 
are satisfied by the function 


(a) We begin by showing that for any real number c 
5 e =c+5 @ 
cu v 
Observe that this equation is valid if c = 0 because 


((2)-0(@) 4G) 


So assume that c 4 0. Regarding cv as the base of the parallelogram deter- 
mined by u and cv, we see that 


A & = base x altitude = |c|(length of v) (altitude) = |c|-A @ 


since the altitude h of the parallelogram determined by u and cv is the same 
as that in the parallelogram determined by u and v. (See Figure 4.4.) Hence 


Figure 4.4 


se) a) ete) eG) 
a6 (") oN @ 25 o) | 


A similar argument shows that 
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U U 
? ta ) = 0-0 ("\) 


for any u,w € R? and any real numbers a and b. Because the parallelograms 
determined by u and w and by u and u+w have a common base u and the 
same altitude (see Figure 4.5), it follows that 


We next prove that 


If a = 0, then 


(uw) ~# (on) 8 (8) 


by the first paragraph of (a). Otherwise, if a 4 0, then 


U U U U 
) =a-d b =a-d| b =b-6 p 
au + bw Ut Pa aw w 


So the desired conclusion is obtained in either case. 
We are now able to show that 


ra) a) ee) 


for all u,v1,v2 € R?. Since the result is immediate if u = 0, we assume that 
u # 0. Choose any vector w € R? such that {u, w} is linearly independent. 
Then for any vectors vj,v2 € R? there exist scalars a; and 6; such that 
vu; = aju+b;w (i = 1,2). Thus 


? (,, i a) ae he et ee ia = ire bale (‘') 
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u u u u 
zo & + i 7 (oon + ~ ac oo aioe (*) , 


A similar argument shows that 


(hy) 00) +608) 
v v v 
for all u,, u2,v € R?. 
(b) Since 


A ©) =0, it follows that 6 (") =O (") “A (") = 
U U U uU 


for any u € R?. 
(c) Because the parallelogram determined by e; and eg is the unit square, 


(8) =0(%) (2) ee eee 
€2 €2 e€2 


Therefore 6 satisfies the three conditions of Exercise 11, and hence 6 = det. 
So the area of the parallelogram determined by u and v equals 


o()-m (9) 


Thus we see, for example, that the area of the parallelogram determined 


by u = (-1,5) and v = (4, —2) is 
-1 5 
a(S ]a18 


«(0 


EXERCISES 


1. Label the following statements as true or false. 


(a) The function det: Mo,2(F) > F is a linear transformation. 

(b) The determinant of a 2 x 2 matrix is a linear function of each row 
of the matrix when the other row is held fixed. 

(c) If A € Moyo(F) and det(A) = 0, then A is invertible. 

(d) If wu and v are vectors in R? emanating from the origin, then the 
area of the parallelogram having u and v as adjacent sides is 


act ("), 
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(e) A coordinate system is right-handed if and only if its orientation 
equals 1. 


Compute the determinants of the following matrices in Mox2(R). 


oO. wo ioe 8 


Compute the determinants of the following matrices in M2x2(C). 
-1+ i 1-4i 5-2t 6+41 2 3 

(a) ( 3 + 2i 3) p) en i Ti ) (<) € ) 

For each of the following pairs of vectors u and v in R?, compute the 

area of the parallelogram determined by u and v. 

(a) u=(3,—2) and v = (2,5) 

(b) u= (1,3) and v = (-3, 1) 


(c) u=(4,-1) and v = (—6, —2) 
(d) u= (38,4) and v = (2, -6) 


Prove that if B is the matrix obtained by interchanging the rows of a 
2 x 2 matrix A, then det(B) = — det(A). 


Prove that if the two columns of A € Moyo(F’) are identical, then 
det(A) = 0. 
Prove that det(A*) = det(A) for any A € Moyo(F). 


Prove that if A € Mox2(F) is upper triangular, then det(A) equals the 
product of the diagonal entries of A. 


Prove that det(AB) = det(A)- det(B) for any A, BE Mox2(F). 


The classical adjoint of a 2 x 2 matrix A € Mox2(F) is the matrix 


Ago —Aig 
C= : 
or .) 
Prove that 


(a) CA= AC = {det(A)]I. 

(b) det(C) = det(A). 

(c) The classical adjoint of A‘ is C’. 

(d) If A is invertible, then A~! = [det(A)]~!C. 


Let 6: Mox2(F) — F be a function with the following three properties. 


(i) 6 is a linear function of each row of the matrix when the other row 
is held fixed. 
(ii) If the two rows of A € M2x2(F) are identical, then 6(A) = 0. 
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(iii) If J is the 2 x 2 identity matrix, then 6(J) = 1. 
Prove that 6(A) = det(A) for all A € Mox2(F). (This result is general- 
ized in Section 4.5.) 


12. Let {u,v} be an ordered basis for R?. Prove that 


o(i) =1 


if and only if {u,v} forms a right-handed coordinate system. Hint: 
Recall the definition of a rotation given in Example 2 of Section 2.1. 


4.2 © DETERMINANTS OF ORDER 72 


In this section, we extend the definition of the determinant to n x n matrices 
for n > 3. For this definition, it is convenient to introduce the following 
notation: Given A € Mnxn(F), for n > 2, denote the (n—1) x (n—1) matrix 


obtained from A by deleting row 7 and column j by A,;. Thus for 


and for 
1 -1 2 -1 
—3 4 1 -1 
Be 9. hr ae. ge Ma): 
—2 6 —4 1 
we have 
: 1 -1 -1 i 1 2 -1 
Bg = 2 —5 8 and Bag = —3 1 -1 
—2 6 1 2 -3 8 


Definitions. Let A € Mnxn(F). Ifn = 1, so that A = (A11), we define 
det(A) = Ay. For n > 2, we define det(A) recursively as 


det(A) = Saas det(Aj;). 


j=l 
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The scalar det(A) is called the determinant of A and is also denoted by |A\. 
The scalar 


(—1)**? det(Aj;) 
is called the cofactor of the entry of A in row i, column j. 
Letting 
cig = (-1)*? det (Ais) 


denote the cofactor of the row 7, column j entry of A, we can express the 
formula for the determinant of A as 


det (A) = Aqici1 + Aigcig +--+ + AinCin- 


Thus the determinant of A equals the sum of the products of each entry in row 
1 of A multiplied by its cofactor. This formula is called cofactor expansion 
along the first row of A. Note that, for 2 x 2 matrices, this definition of 
the determinant of A agrees with the one given in Section 4.1 because 


det(A) => Aji (—1)1*! det(A1,) + Ajo(-1)'? det(Aj2) — Ay Ago = Aj Ao. 


Example 1 
Let 
1 30-3 
A= |-3 —-5 2/eE Msx3(R) 
-4 4 -6 


Using cofactor expansion along the first row of A, we obtain 


det(A) = (—1)441.Ay,- det(A11) + (—1)1*?.Aj2- det(Ai2) 
+ (-1)'*3A)3- det(Aj3) 


= (-1)°(1)- det Gi =) +(-D°@)- i 


= 1[-5(-6) — 2(4)] — 3 [-3(—6) — 2(—-4)] - 3[-3(4) - (—5)(-4)) 
= 1(22) — 3(26) — 3(—32) 
= 40. ¢ 
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O° 3 
B=[-2 -3 —5 | © Mgx3(R). 
4-4 4 


Using cofactor expansion along the first row of B, we obtain 


Example 2 
Let 


det(B) = (1) Bye det(B,,) + (=1)'** Bip: det(B,2) 
+ (—1)'*°By3- det(B,3) 


= (-1)2(0)- det e “) + C84) act e %) 


= 0—1[-2(4) — (—5)(4)] + 3 [-2(—4) — (-3)(4)] 
= 0 —1(12) + 3(20) 
=48. 


Example 3 
Let 


2 0 0 1 
0 1 3. -3 
—2 -3 -5 2 
4 -4 4 -6 


C= 


Using cofactor expansion along the first row of C and the results of Examples 1 
and 2, we obtain 


det(C) = (—1)?(2)- det(C1) + (—1)3(0)+ det(C2) 
+ (—1)4(0) + det(C13) + (—1)°(1)+ det(C14) 


1 3 -3 
= (—1)°(2)- det [.: —5 j +0+0 


-4 4 -6 
% % 3 
+(—1)°(1)- det | -2 -3 —5 
4-4 4 


= 2(40) +0 + 0 — 1(48) 
=32. ¢ 
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Example 4 


The determinant of the n x n identity matrix is 1. We prove this assertion by 
mathematical induction on n. The result is clearly true for the 1 x 1 identity 
matrix. Assume that the determinant of the (n— 1) x (n—1) identity matrix 
is 1 for some n > 2, and let J denote the n x n identity matrix. Using cofactor 
expansion along the first row of J, we obtain 


det(I) = (—1)?(1)- det(1,,) + (—1)3(0)+ det(1y2) + --- 
+ (—1)'*"(0)- det (Lin) 
= 1(1)+0+---+0 


because I; is the (n — 1) x (n — 1) identity matrix. This shows that the 
determinant of the n x n identity matrix is 1, and so the determinant of any 
identity matrix is 1 by the principle of mathematical induction. 


As is illustrated in Example 3, the calculation of a determinant using 
the recursive definition is extremely tedious, even for matrices as small as 
4x4. Later in this section, we present a more efficient method for evaluating 
determinants, but we must first learn more about them. 

Recall from Theorem 4.1 (p. 200) that, although the determinant of a 2 x 2 
matrix is not a linear transformation, it is a linear function of each row when 
the other row is held fixed. We now show that a similar property is true for 
determinants of any size. 


Theorem 4.3. The determinant of an n X n matrix is a linear function 
of each row when the remaining rows are held fixed. That is, forl <r <n, 
we have 


a1 ay ay 
ar—-1 Gr—1 Qr—1 

det | ut+ kv | = det U + kdet v 
Art ar41 Art 

an an an 


whenever k is a scalar and u, v, and each a; are row vectors in F”. 


Proof. The proof is by mathematical induction on n. The result is imme- 
diate ifn = 1. Assume that for some integer n > 2 the determinant of any 
(n — 1) x (n—1) matrix is a linear function of each row when the remaining 
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rows are held fixed. Let A be ann xn matrix with rows a1, da9,...,@y, respec- 
tively, and suppose that for some r (1 < r < n), we have a, = u+kv for some 
u,v € F” and some scalar k. Let u = (b1, b2,...,bn) and v = (c1,€2,.--,€n), 


and let B and C be the matrices obtained from A by replacing row r of A by 
u and v, respectively. We must prove that det(A) = det(B) + kdet(C). We 
leave the proof of this fact to the reader for the case r = 1. For r > 1 and 
1<j <n, the rows of Ai;, By;, and Cy are the same except for row r — 1. 
Moreover, row r — 1 of Ai; is 


eat ee 1+ ke; indyqa + heey 0015 Og Rey), 


which is the sum of row r—1 of Bi; and k times row r — 1 of Cm Since Bi; 
and CO); are (n — 1) x (n— 1) matrices, we have 


det(Aj;) = det(By;) + k det(C};) 


by the induction hypothesis. Thus since A,; = By; = C1,;, we have 


n 


det(A) = S°(-1) +9 Ay; : det(A1;) 


=S\(-1)! Ay: [act(Br,) + kdet(Crs)| 


= S>(-1) +9 Ay; : det(B1;) + kS0(-1)'7 Ay; det(C1;) 


j=l j=l 


This shows that the theorem is true for n x n matrices, and so the theorem 
is true for all square matrices by mathematical induction. | 


Corollary. If A € Myxn(F) has a row consisting entirely of zeros, then 
det(A) = 0. 


Proof. See Exercise 24. | 


The definition of a determinant requires that the determinant of a matrix 
be evaluated by cofactor expansion along the first row. Our next theorem 
shows that the determinant of a square matrix can be evaluated by cofactor 
expansion along any row. Its proof requires the following technical result. 


Lemma. Let B € Mnxn(F), where n > 2. If row i of B equals ex for 
some k (1 <k <n), then det(B) = (—1)'** det(Bix). 
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Proof. The proof is by mathematical induction on n. The lemma is easily 
proved for n = 2. Assume that for some integer n > 3, the lemma is true for 
(n — 1) x (n—1) matrices, and let B be an n x n matrix in which row i of B 
equals e; for some & (1 < k <n). The result follows immediately from the 
definition of the determinant if i = 1. Suppose therefore that 1 <i<n. For 
each j #k (1 <j <n), let Ci; denote the (n — 2) x (n — 2) matrix obtained 
from B by deleting rows 1 and 7 and columns j and k. For each j, row i—1 
of B1; is the following vector in F"~!: 


C€k-1 if 7 <k 
0 ifj=k 
ek ifj >k. 
Hence by the induction hypothesis and the corollary to Theorem 4.3, we have 
(—1)@-D+@-) det(Cij) if j<k 
det(By;) = 0 if 7 =k 
(—1)@-D+* det(C;,;) if j >k. 


Therefore 
det(B) = So(-1)*9 By: det(Bi;) 

j=l 

= S0(-1)'7 By; + det(Byj) + $0 (-1)" By; det(Bi;) 
j<k j>k 

= Hy By: [Hy aet(C)] 
j<k 

+ )(-1)*9Baj- [(-1) 6-9 +* aet(C)| 
j>k 


= (-1)*** |S °(-1)"7 By; det (Cis) 


j<k 
+ SO Bi; « det(C; ;) 
j>k 


Because the expression inside the preceding bracket is the cofactor expan- 
sion of B;, along the first row, it follows that 


det(B) = (—1)'** det(B,,). 


This shows that the lemma is true for n x n matrices, and so the lemma is 
true for all square matrices by mathematical induction. 
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We are now able to prove that cofactor expansion along any row can be 
used to evaluate the determinant of a square matrix. 


Theorem 4.4. The determinant of a square matrix can be evaluated by 
cofactor expansion along any row. That is, if A © Mnxn(F), then for any 
integer i (lL <i<n), 


j=1 


Proof. Cofactor expansion along the first row of A gives the determinant 
of A by definition. So the result is true if? = 1. Fix i > 1. Row i of A can 
be written as 4 A,je;. For 1 <j <n, let B; denote the matrix obtained 
from A by replacing row i of A by e;. Then by Theorem 4.3 and the lemma, 
we have 


det (A) = So Ai det(B;) = SM dy * det(Aj;). | 


Corollary. If A € Mnxn(F’) has two identical rows, then det(A) = 0. 


Proof. The proof is by mathematical induction on n. We leave the proof 
of the result to the reader in the case that n = 2. Assume that for some 
integer n > 3, it is true for (n — 1) x (n— 1) matrices, and let rows r and 
s of AE Mnxn(F) be identical for r 4 s. Because n > 3, we can choose an 
integer i (1 <7 <n) other than r and s. Now 


det(A) = Syd, : det(A;;) 


j=l 


by Theorem 4.4. Since each Aj; is an (n — 1) x (n — 1) matrix with two 
identical rows, the induction hypothesis implies that each det(A;,;) = 0, and 
hence det(A) = 0. This completes the proof for n x n matrices, and so the 


lemma is true for all square matrices by mathematical induction. | 


It is possible to evaluate determinants more efficiently by combining co- 
factor expansion with the use of elementary row operations. Before such a 
process can be developed, we need to learn what happens to the determinant 
of a matrix if we perform an elementary row operation on that matrix. The- 
orem 4.3 provides this information for elementary row operations of type 2 
(those in which a row is multiplied by a nonzero scalar). Next we turn our 
attention to elementary row operations of type 1 (those in which two rows 
are interchanged). 
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Theorem 4.5. If A € Mnyn(F) and B is a matrix obtained from A by 
interchanging any two rows of A, then det(B) = — det(A). 


Proof. Let the rows of A € Maxn(F) be a1,a2,...,@n, and let B be the 
matrix obtained from A by interchanging rows r and s, where r < s. Thus 


a1 ay 
ar As 
A=|: and B= ; 
as ar 
an an 


Consider the matrix obtained from A by replacing rows r and s by a, + ag. 
By the corollary to Theorem 4.4 and Theorem 4.3, we have 


ay ay ay 
Ap + As Ay Qs 
0 = det : = det : + det 
Grp + As Gr + As Grp + As 
an an an 
ay ay ay a1 
Ar Ay As As 
=det] : | +det}] : | +det] : | + det 
ar as ar as 
an an an an 


= 0+ det(A) + det(B) + 0. 
Therefore det(B) = — det(A). | 


We now complete our investigation of how an elementary row operation 
affects the determinant of a matrix by showing that elementary row operations 
of type 3 do not change the determinant of a matrix. 


Theorem 4.6. Let A € Mayn(F), and let B be a matrix obtained by 
adding a multiple of one row of A to another row of A. Then det(B) = det(A). 
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Proof. Suppose that B is the n x n matrix obtained from A by adding k 
times row r to row s, where r 4 s. Let the rows of A be aj,a2,...,@n, and 
the rows of B be 64, bo,...,6n. Then b; = a; for i #4 s and b, = a, + ka,. 
Let C be the matrix obtained from A by replacing row s with a,. Applying 
Theorem 4.3 to row s of B, we obtain 


det(B) = det(A) + k det(C) = det (A) 
because det(C’) = 0 by the corollary to Theorem 4.4. | 


In Theorem 4.2 (p. 201), we proved that a 2 x 2 matrix is invertible if 
and only if its determinant is nonzero. As a consequence of Theorem 4.6, we 
can prove half of the promised generalization of this result in the following 
corollary. The converse is proved in the corollary to Theorem 4.7. 


Corollary. If A € Mnxn(F’) has rank less than n, then det(A) = 0. 


Proof. If the rank of A is less than n, then the rows a1, a@2,...,@, of A are 
linearly dependent. By Exercise 14 of Section 1.5, some row of A, say, row r, 
is a linear combination of the other rows. So there exist scalars c; such that 


Ap = C1Q1 ++++ + Cp_14y, 1+ Cr41Gr41 +++ + CnGn. 


Let B be the matrix obtained from A by adding —c; times row 7 to row r for 
each i £ r. Then row r of B consists entirely of zeros, and so det(B) = 0. 
But by Theorem 4.6, det(B) = det(A). Hence det(A) = 0. 


The following rules summarize the effect of an elementary row operation 
on the determinant of a matrix A € Myxn(F). 


(a) If B is a matrix obtained by interchanging any two rows of A, then 
det(B) = — det(A). 

(b) If B is a matrix obtained by multiplying a row of A by a nonzero scalar 
k, then det(B) = kdet(A). 

(c) If B is a matrix obtained by adding a multiple of one row of A to another 
row of A, then det(B) = det(A). 


These facts can be used to simplify the evaluation of a determinant. Con- 
sider, for instance, the matrix in Example 1: 


1 3. 3 
A= |{-3 —-5 2 
—4 4 —6 


4 -8 
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Since M was obtained by performing two type 3 elementary row operations 
on A, we have det(A) = det(M/). The cofactor expansion of M along the first 
row gives 


det(M) = (—1)*1(1)- det(My1) + (—1)'*?(4)- det (M2) 
+ (—1)!+3(—3). det( M3). 


Both M 12 and M. 13 have a column consisting entirely of zeros, and so 
det (M2) = det(M3) = 0 by the corollary to Theorem 4.6. Hence 


det(M) = (—1)'*1(1)- det(M,1) 


= (-1)'#1(1)- det i a) 
= 1[4(—18) — (-7)(16)] = 40. 


Thus with the use of two elementary row operations of type 3, we have reduced 
the computation of det(A) to the evaluation of one determinant of a 2 x 2 
matrix. 

But we can do even better. If we add —4 times row 2 of M to row 3 
(another elementary row operation of type 3), we obtain 


1 4 —-8 
P=|0 4 -7 
0 0 10 


Evaluating det(P) by cofactor expansion along the first row, we have 


det(P) = (—1)'*1(1)- det(Py1) 


= (-1)'*1(1)- det ¢ i =1-4-10 = 40, 


as described earlier. Since det(A) = det(M) = det(P), it follows that 
det(A) = 40. 

The preceding calculation of det(P) illustrates an important general fact. 
The determinant of an upper triangular matrix is the product of its diagonal 
entries. (See Exercise 23.) By using elementary row operations of types 1 
and 3 only, we can transform any square matrix into an upper triangular 
matrix, and so we can easily evaluate the determinant of any square matrix. 
The next two examples illustrate this technique. 


Example 5 
To evaluate the determinant of the matrix 
0 1 3 
B= |-2 -3 —-5 


4 -4 4 
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in Example 2, we must begin with a row interchange. Interchanging rows 1 
and 2 of B produces 


—2 -3 —-5 
C= 0 1 3 
4 -A4 4 


By means of a sequence of elementary row operations of type 3, we can 
transform C into an upper triangular matrix: 


Se Sy oe. 2 a ae 
‘eee es | =e es es Ce 2 (—— O 
a ae Oy 2105 6 0 0 24 


Thus det(C) = —2-1-24 = —48. Since C was obtained from B by an inter- 
change of rows, it follows that 


det(B) = —det(C)=48. 


Example 6 


The technique in Example 5 can be used to evaluate the determinant of the 
matrix 


4 -4 4 -6 


in Example 3. This matrix can be transformed into an upper triangular 
matrix by means of the following sequence of elementary row operations of 
type 3: 


2 0 0 1 2 0 0 1 2 O 0 1 
0 1 3. 3 _ 0 1 3. 3 a, 0 1 3 -83 
—2 -3 —-5 2 0 -3 -5 38 0 O 4 -6 
4 -4 4 -6 0 -4 4 -8 0 O 16 —20 

2 0 O 1 

—_ 0 1 3. 3 

0 0 4 -6 

0 0 O 4 


Thus det(C) =2-1-4-4=32. @ 


Using elementary row operations to evaluate the determinant of a matrix, 
as illustrated in Example 6, is far more efficient than using cofactor expansion. 
Consider first the evaluation of a 2 x 2 matrix. Since 


a b 
act (° 4) = ad be 
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the evaluation of the determinant of a 2 x 2 matrix requires 2 multiplications 
(and 1 subtraction). For n > 3, evaluating the determinant of an n xn matrix 
by cofactor expansion along any row expresses the determinant as a sum of n 
products involving determinants of (n—1) x (n—1) matrices. Thus in all, the 
evaluation of the determinant of an n x n matrix by cofactor expansion along 
any row requires over n! multiplications, whereas evaluating the determinant 
of an n X n matrix by elementary row operations as in Examples 5 and 6 
can be shown to require only (n° + 2n — 3)/3 multiplications. To evaluate 
the determinant of a 20 x 20 matrix, which is not large by present standards, 
cofactor expansion along a row requires over 20! = 2.4 x 10!° multiplica- 
tions. Thus it would take a computer performing one billion multiplications 
per second over 77 years to evaluate the determinant of a 20 x 20 matrix by 
this method. By contrast, the method using elementary row operations re- 
quires only 2679 multiplications for this calculation and would take the same 
computer less than three-millionths of a second! It is easy to see why most 
computer programs for evaluating the determinant of an arbitrary matrix do 
not use cofactor expansion. 

In this section, we have defined the determinant of a square matrix in 
terms of cofactor expansion along the first row. We then showed that the 
determinant of a square matrix can be evaluated using cofactor expansion 
along any row. In addition, we showed that the determinant possesses a 
number of special properties, including properties that enable us to calculate 
det(B) from det(A) whenever B is a matrix obtained from A by means of an 
elementary row operation. These properties enable us to evaluate determi- 
nants much more efficiently. In the next section, we continue this approach 
to discover additional properties of determinants. 


EXERCISES 


1. Label the following statements as true or false. 


(a) The function det: Mnxn(f) > F is a linear transformation. 

(b) The determinant of a square matrix can be evaluated by cofactor 
expansion along any row. 

(c) If two rows of a square matrix A are identical, then det(A) = 0. 

(d) If Bis a matrix obtained from a square matrix A by interchanging 

any two rows, then det(B) = — det(A). 

(e) If B is a matrix obtained from a square matrix A by multiplying 

a row of A by a scalar, then det(B) = det(A). 

(f) If B is a matrix obtained from a square matrix A by adding k 

times row 7 to row j, then det(B) = k det(A). 

(g) If AG Mnxn(F) has rank n, then det(A) = 0. 

(h) The determinant of an upper triangular matrix equals the product 
of its diagonal entries. 
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2. Find the value of k that satisfies the following equation: 


3a, 3aq 3a3 a, a2 a3 
det 3b, 3b 3b3 = kdet by bg b3 
3c, 3c2 3c3 Cy C2.) C3 


3. Find the value of k that satisfies the following equation: 


2a4 2a 243 a, a2 43 
det | 3b; + 5c, 3b2+5c2 363+5c3 ] =kdet | bi bo bg 
7c, 7c2 7c3 C1 C2 63 


4. Find the value of & that satisfies the following equation: 


bb +c bot+ecg 63+ 63 a, a2 a3 
det | a, +c, Go+co ag+c3}] =kdet | b; bo bg 
ajt+b, ag+b2 agt+bs Cy C2 C3 


In Exercises 5-12, evaluate the determinant of the given matrix by cofactor 
expansion along the indicated row. 


0 1 2 1 0 2 
—-l1 0 -8 0 1 5 
ae ie ae aes ee 
along the first row along the first row 
0 1 2 1 0 2 
-l1 oO -8 0 1 5 
. y 3. 0 an ec 
along the second row along the third row 
O l1+i 2 a 2+7 O 
—2i O tl1-i -1 8 2i 
* \ 3 4 0 Pee ag? ofa: Shea 
along the third row along the second row 
0 2 1 = 8 1-1 2 -1 
1 0 —2 2 -3 4 1 -!1 
11. 3 -l1 0 1 12. 2 -5 -3 8 
-1 1 220. £0 —2 6 -4 1 
along the fourth row along the fourth row 


In Exercises 13-22, evaluate the determinant of the given matrix by any le- 
gitimate method. 


222 


13. 
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23. 
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26. 
27: 
28. 
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30. 
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0 0 1 2. B04 

0 2 8 14.15 6 O 

4 5 6 7 0 O 

1 2 3 —1 3 

4 5 6 16. 4 —8 1 

7 8 9 2 2 5 

0 1 1 1 -2 3 

1 ; = 18. | —-1 2 —5 

6 — 3. -l 2 
a —-1 241 3 
3 20. | 1-2 1 1 

—21 37 2 —1+i 
1 0 —2 3 1 —-2 3 —12 

—3 1 1 2 22 —5 12 —-14 19 
0 4 —-1 1 “1-9 22 —20 31 
2 3 0 1 —4 9 —-14 15 


Prove that the determinant of an upper triangular matrix is the product 
of its diagonal entries. 


Prove the corollary to Theorem 4.3. 

Prove that det(kA) = k” det(A) for any A € Myxn(F). 

Let A € Mnxn(F). Under what conditions is det(—A) = det(A)? 
Prove that if A € Mnxn(F) has two identical columns, then det(A) = 0. 
Compute det(F;) if E; is an elementary matrix of type 1. 

Prove that if EF is an elementary matrix, then det(E‘) = det(E). 


Let the rows of A € Mnxn(F) be a1, a2,...,@n, and let B be the matrix 
in which the rows are dp,@n-1,---,@1. Calculate det(B) in terms of 
det(A). 


PROPERTIES OF DETERMINANTS 


In Theorem 3.1, we saw that performing an elementary row operation on 
a matrix can be accomplished by multiplying the matrix by an elementary 
matrix. This result is very useful in studying the effects on the determinant of 
applying a sequence of elementary row operations. Because the determinant 
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of the nxn identity matrix is 1 (see Example 4 in Section 4.2), we can interpret 
the statements on page 217 as the following facts about the determinants of 
elementary matrices. 


(a) If E is an elementary matrix obtained by interchanging any two rows 
of I, then det(#) = —1. 

(b) If £ is an elementary matrix obtained by multiplying some row of I by 
the nonzero scalar k, then det(£) = k. 

(c) If F is an elementary matrix obtained by adding a multiple of some row 
of I to another row, then det(F) = 1. 


We now apply these facts about determinants of elementary matrices to 
prove that the determinant is a multiplicative function. 


Theorem 4.7. For any A,B € Mnxn(F), det(AB) = det(A)- det(B). 


Proof. We begin by establishing the result when A is an elementary matrix. 
If A is an elementary matrix obtained by interchanging two rows of J, then 
det(A) = —1. But by Theorem 3.1 (p. 149), AB is a matrix obtained by 
interchanging two rows of B. Hence by Theorem 4.5 (p. 216), det(AB) = 
—det(B) = det(A)- det(B). Similar arguments establish the result when A 
is an elementary matrix of type 2 or type 3. (See Exercise 18.) 

If A is an n x n matrix with rank less than n, then det(A) = 0 by the 
corollary to Theorem 4.6 (p. 216). Since rank(AB) < rank(A) <n by Theo- 
rem 3.7 (p. 159), we have det(AB) = 0. Thus det(AB) = det(A)- det(B) in 
this case. 

On the other hand, if A has rank n, then A is invertible and hence is 
the product of elementary matrices (Corollary 3 to Theorem 3.6 p. 159), say, 
A=E,,---E,E,. The first paragraph of this proof shows that 


det(AB) = det (En ous E2E,B) 
= det (Ein) 0 det(En—1 pias E2 EB) 


= det(E,,)+--+-++ det(E2)+ det(E,)- det(B) 

= det(E,, +++ E2E1)+ det(B) 

= det(A)- det(B). | 
Corollary. A matrix A € Mnxn(F) is invertible if and only if det(A) 4 0. 


Furthermore, if A is invertible, then det(A~!) = ———~. 
et (A) 


Proof. If A € Mnxn(F) is not invertible, then the rank of A is less than n. 
So det(A) = 0 by the corollary to Theorem 4.6 (p, 217). On the other hand, 
if A € Mnyn(F) is invertible, then 


det(A)- det(A~') = det(AA™~*) = det(I) = 1 
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1 
| 


by Theorem 4.7. Hence det(A) 4 0 and det(A~") - det(A)’ 


In our discussion of determinants until now, we have used only the rows 
of a matrix. For example, the recursive definition of a determinant involved 
cofactor expansion along a row, and the more efficient method developed in 
Section 4.2 used elementary row operations. Our next result shows that the 
determinants of A and A® are always equal. Since the rows of A are the 
columns of A‘, this fact enables us to translate any statement about determi- 
nants that involves the rows of a matrix into a corresponding statement that 
involves its columns. 


Theorem 4.8. For any A € Mnxn(F), det(A‘) = det(A). 


Proof. If A is not invertible, then rank(A) < n. But rank(A‘) = rank(A) 
by Corollary 2 to Theorem 3.6 (p. 158), and so A‘ is not invertible. Thus 
det(A‘) = 0 = det(A) in this case. 

On the other hand, if A is invertible, then A is a product of elementary 
matrices, say A = E,---E2E\. Since det(E;) = det(E!) for every i by 
Exercise 29 of Section 4.2, by Theorem 4.7 we have 


det (A) = det(E} E5--- E*,) 
= det(Ej)-+ det(E5)---- + det(E%,) 
= det(F)+ det(E2)---- + det(En) 
= det(Em)+ +++: eS - det (£1) 
= det (Ey, --- Ep E1) 

(A). 


Thus, in either case, det(A’) = det(A). | 


Among the many consequences of Theorem 4.8 are that determinants can 
be evaluated by cofactor expansion along a column, and that elementary col- 
umn operations can be used as well as elementary row operations in evaluating 
a determinant. (The effect on the determinant of performing an elementary 
column operation is the same as the effect of performing the corresponding 
elementary row operation.) We conclude our discussion of determinant prop- 
erties with a well-known result that relates determinants to the solutions of 
certain types of systems of linear equations. 


Theorem 4.9 (Cramer’s Rule). Let Ax = b be the matrix form of 
a system of n linear equations in n unknowns, where x = (1, %2,-.-,%n)*. 
If det(A) 4 0, then this system has a unique solution, and for each k (k = 


n), 
det(M,) 


vk “aet(A) ” 
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where M;, is the n x n matrix obtained from A by replacing column k of A 
by b. 


Proof. If det(A) 4 0, then the system Ax = b has a unique solution by 
the corollary to Theorem 4.7 and Theorem 3.10 (p. 174). For each integer k 
(1<k <n), let a, denote the kth column of A and X, denote the matrix 
obtained from the n x n identity matrix by replacing column k by x. Then 
by Theorem 2.13 (p. 90), AX; is the n x n matrix whose ith column is 


Ae; =a; ifix¢k and Ar=b ifi=k. 
Thus AX; = Mx. Evaluating X; by cofactor expansion along row k produces 
det(X,) = xp det(In_1) = £r. 
Hence by Theorem 4.7, 
det (M;,) = det(AX;,) = det(A)-+ det(X;,) = det(A)- ap. 
Therefore 
ap = [det(A)]~!+ det(M;). | 


Example 1 


We illustrate Theorem 4.9 by using Cramer’s rule to solve the following system 
of linear equations: 


4 222 +r 323 =2 
Ly +r = 3 
G27 @Q- L3=> 1. 


The matrix form of this system of linear equations is Ax = b, where 


1 2 3 2 
A= {1 0 1 and b=1{3 
1 1 -l 1 


Because det(A) = 6 4 0, Cramer’s rule applies. Using the notation of Theo- 
rem 4.9, we have 


2 2 3 
det | 3 0 1 
det(M;) a re aes 
1 det (A) det (A) ae 
1 2 3 
det | 1 3 1 
det(M. i esi = 
ee et(Mz) _ = Gs 1 
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and 
1 2 2 
dett{1 0 3 
det (M3) i Re asa 
v2 = => = = . 
3 “det(A) det(A) 6 2 


Thus the unique solution to the given system of linear equations is 


5 1 
ee ee a 
(1, £2, £3) G. 3) ¢ 


In applications involving systems of linear equations, we sometimes need 
to know that there is a solution in which the unknowns are integers. In this 
situation, Cramer’s rule can be useful because it implies that a system of linear 
equations with integral coefficients has an integral solution if the determinant 
of its coefficient matrix is +1. On the other hand, Cramer’s rule is not useful 
for computation because it requires evaluating n + 1 determinants of n x n 
matrices to solve a system of n linear equations in n unknowns. The amount 
of computation to do this is far greater than that required to solve the system 
by the method of Gaussian elimination, which was discussed in Section 3.4. 
Thus Cramer’s rule is primarily of theoretical and aesthetic interest, rather 
than of computational value. 

As in Section 4.1, it is possible to interpret the determinant of a matrix 
A €Mnxn(R) geometrically. If the rows of A are a1, a2,..., An, respectively, 
then |det(A)| is the n-dimensional volume (the generalization of area in 
R? and volume in R?) of the parallelepiped having the vectors a1,a2,...,@n 
as adjacent sides. (For a proof of a more generalized result, see Jerrold 
E. Marsden and Michael J. Hoffman, Elementary Classical Analysis, W.H. 
Freeman and Company, New York, 1993, p. 524.) 


Example 2 


The volume of the parallelepiped having the vectors aj = (1,—2,1), ag = 
(1,0,—1), and a3 = (1, 1,1) as adjacent sides is 


Ti. 2 1 
det | 1 0 -1}|/=6 
1 1 1 


Note that the object in question is a rectangular parallelepiped (see Fig- 
ure 4.6) with sides of lengths V6, V2, and 3. Hence by the familiar formula 
or volume, its volume should be V6-V2-V3 = 6, as the determinant calcu- 
lation shows. 4 


In our earlier discussion of the geometric significance of the determinant 
formed from the vectors in an ordered basis for R?, we also saw that this 
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(1,1, 1) 
(1, —2, 1) 


a 


(1,0, —1) 


Figure 4.6: Parallelepiped determined by three vectors in R°. 


determinant is positive if and only if the basis induces a right-handed coor- 
dinate system. A similar statement is true in R”. Specifically, if 7 is any 
ordered basis for R” and ( is the standard ordered basis for R”, then 7 in- 
duces a right-handed coordinate system if and only if det(Q) > 0, where Q is 
the change of coordinate matrix changing y-coordinates into (-coordinates. 
Thus, for instance, 


1 1 0 
y= 1}],{-1],]0 
0 0 1 


induces a left-handed coordinate system in R? because 


1 1 0 
det | 1 —1 0] =-2 <0, 
0 0 1 
whereas 
1 —2 0 
y= 2], 1], {0 
0 0 1 


induces a right-handed coordinate system in R® because 


1 -—2 0 
det | 2 1 O|] =5>0. 
0 0 1 
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More generally, if 6 and y are two ordered bases for R”, then the coordinate 
systems induced by 8 and y have the same orientation (either both are 
right-handed or both are left-handed) if and only if det(Q) > 0, where Q is 
the change of coordinate matrix changing y-coordinates into 3-coordinates. 


EXERCISES 


1. Label the following statements as true or false. 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 


(g) 
(h) 


If £ is an elementary matrix, then det(£) = +1. 

For any A, BE Mnxn(F), det(AB) = det(A)- det(B). 

A matrix M € My x»(£) is invertible if and only if det(Z) = 0. 
A matrix M € Myx,»(F) has rank n if and only if det(M) 4 0. 
For any A € May, (F), det(A*) = — det(A). 

The determinant of a square matrix can be evaluated by cofactor 
expansion along any column. 

Every system of n linear equations in n unknowns can be solved 
by Cramer’s rule. 

Let Ax = 6 be the matrix form of a system of n linear equations 
in n unknowns, where x = (21, %2,...,2%n)*. If det(A) 4 0 and if 
M,, is the n x n matrix obtained from A by replacing row k of A 
by b’, then the unique solution of Ax = b is 


det(M;,) 
= —— for k=1,2,...,n. 
Uk det (A) or 94) »n 
In Exercises 2-7, use Cramer’s rule to solve the given system of linear equa- 
tions. 
A412, + A142Q%2 = by 224 hos 323 =15 
2. 91%, + A22%2 = by 3. Oe 222 + v3 = 10 
where 411422 — 412021 x 0 321 aT Ax = 2x3 = 0 
221 +r &— 323 = 1 Ly x2 4x3 =-4 
4. ty 2X9 + £3= 0 5. 8x1 3X2 L3 = 8 
321 Axo 2x3 =-—5) 221 —- a7 32> 0 
Ly v2 473 =-—2 321 x2 v3 = 4 
6. —8%7, + 342 + r3= O 7. —2%, — 2% = 12 
2x4 —-— Mr = 6 v1 2X9 L3 = 8 
8. Use Theorem 4.8 to prove a result analogous to Theorem 4.3 (p. 212), 
but for columns. 
9. Prove that an upper triangular n x n matrix is invertible if and only if 


all its diagonal entries are nonzero. 
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10. A matrix M € Myx»(C) is called nilpotent if, for some positive integer 
k, M® = O, where O is the n x n zero matrix. Prove that if M is 
nilpotent, then det(M) = 0. 


11. A matrix M € Mpyxn(C) is called skew-symmetric if Mt = —M. 
Prove that if M is skew-symmetric and n is odd, then M is not invert- 
ible. What happens if n is even? 


12. A matrix Q € Myyn(R) is called orthogonal if QQ’ = I. Prove that 
if Q is orthogonal, then det(Q) = +1. 


13. For M€ Mnxn(C), let M be the matrix such that (M);; = Mj; for all 


1,j, where M;; is the complex conjugate of Mj;. 


(a) Prove that det(M) = det(M). 
(b) A matrix Q € Mnxn(C) is called unitary if QQ* = I, where 
Q* = Q'. Prove that if Q is a unitary matrix, then | det(Q)| = 1. 


14. Let 3 = {u1,u2,...,Un} be a subset of F” containing n distinct vectors, 
and let B be the matrix in Mnxn(£) having u; as column j. Prove that 
@ is a basis for F” if and only if det(B) 4 0. 


15.1 Prove that if A,B € M,x»(F) are similar, then det(A) = det(B). 


16. Use determinants to prove that if A,B © Mnxn(F) are such that AB = 
I, then A is invertible (and hence B = A). 


17. Let A,B E Mnxn(F) be such that AB = —BA. Prove that if n is odd 
and F is not a field of characteristic two, then A or B is not invertible. 


18. Complete the proof of Theorem 4.7 by showing that if A is an elementary 
matrix of type 2 or type 3, then det(AB) = det(A)- det(B). 


19. A matrix A € Mnxn(F) is called lower triangular if A;; = 0 for 
1<it<j<n. Suppose that A is a lower triangular matrix. Describe 
det(A) in terms of the entries of A. 


20. Suppose that M € Myxn(F) can be written in the form 


A B 
w=(6 7) 


where A is a square matrix. Prove that det(M) = det(A). 


21.1 Prove that if M € Mn xn(F) can be written in the form 


AB 
u=(3 G) 


where A and C are square matrices, then det(M/) = det(A)- det(C). 
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Let T: P,(F) — F"*t+ be the linear transformation defined in Exer- 
cise 22 of Section 2.4 by T(f) = (f(co), f(c1),---,f(cn)), where 
Co,C1,;---;Cn are distinct scalars in an infinite field F. Let @ be the 
standard ordered basis for P,,(F’) and y be the standard ordered basis 
for For, 


(a) Show that M = [T]} has the form 


toy. ee oe cP 
i ee ee 
LCs om 


A matrix with this form is called a Vandermonde matrix. 
(b) Use Exercise 22 of Section 2.4 to prove that det(M) 4 0. 
(c) Prove that 
det(M 


) = Il (c; mee 


O<i<j<n 
the product of all terms of the form c; — ¢; for0 <i<jg<n. 


Let A € Myxn(F) be nonzero. For any m (1 < m <n), anmxm 
submatrix is obtained by deleting any n — m rows and any n — m 
columns of A. 


(a) Let k (1 <k <n) denote the largest integer such that some k x k 
submatrix has a nonzero determinant. Prove that rank(A) = k. 

(b) Conversely, suppose that rank(A) = k. Prove that there exists a 
k x k submatrix with a nonzero determinant. 


Let A € Mnxn(F’) have the form 


0 00 0 ao 
ab 26. 0 a 
Fee)! 0. aie 30 0 as 
0 0 0 ee 


Compute det(A + tI), where I is the n x n identity matrix. 


Let cj, denote the cofactor of the row j, column k entry of the matrix 
A € Mnxn(F). 


(a) Prove that if B is the matrix obtained from A by replacing column 
k by e;, then det(B) = cjg. 
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(b) Show that for 1 < 7 <n, we have 


Hint: Apply Cramer’s rule to Ax = e;. 
(c) Deduce that if C is the n x n matrix such that Ci; = c;;, then 
AC = [det(A)]J. 
(d) Show that if det(A) 4 0, then A~! = [det(A)]~!C. 
The following definition is used in Exercises 26-27. 
Definition. The classical adjoint of a square matrix A is the transpose 
of the matrix whose ij-entry is the ij-cofactor of A. 


26. Find the classical adjoint of each of the following matrices. 


4 0 0 
(a) a 2) (b) {0 4 0 
21 22 004 
-4 0 0 3.6 7 
(c) 0 2 0 (d) {0 4 8 
00 5 00 5 
1-1 0 0 7 1 4 
(e) 4 34 0 (f) 6 —3 0 
2 #144 -1 -3 5 -2 
—1 2 5 3 2+7 0 
(g) 8 0 -3 (h) | -1+7% 0 a 
4 6 1 0 1 3 — 21 
27. Let C be the classical adjoint of A € Myx»n(F). Prove the following 


statements. 

(a) det(C) = [det(A)]”—?. 

(b) C* is the classical adjoint of A‘. 

(c) If A is an invertible upper triangular matrix, then C and A~! are 
both upper triangular matrices. 


28. Let y1,y2,---,Yn be linearly independent functions in C%. For each 
y € C®, define T(y) € C® by 
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The preceding determinant is called the Wronskian of y, y1,..-,Yn- 


(a) Prove that T: C° — C®™ is a linear transformation. 
(b) Prove that N(T) = span({y1, yo,---;Yn})- 


4.4 SUMMARY—IMPORTANT FACTS ABOUT DETERMINANTS 


In this section, we summarize the important properties of the determinant 
needed for the remainder of the text. The results contained in this section 
have been derived in Sections 4.2 and 4.3; consequently, the facts presented 
here are stated without proofs. 

The determinant of an n x n matrix A having entries from a field F' is a 
scalar in F’', denoted by det(A) or |A|, and can be computed in the following 
manner: 


1. If Ais 1 x 1, then det(A) = Aji, the single entry of A. 


2. If Ais 2 x 2 then det (A) = Ay; A22 a Aj2Ao1. For example, 


(if the determinant is evaluated by the entries of row i of A) or 


det(A) = Yo(-) Ag - det(Ai;) 
w=1 


(if the determinant is evaluated by the entries of column j of A), where 


Aj; is the (n—1) x (n—1) matrix obtained by deleting row i and column 
j from A. 


In the formulas above, the scalar (—1)'+/ det(A;;) is called the cofactor 
of the row 7 column j entry of A. In this language, the determinant of A is 
evaluated as the sum of terms obtained by multiplying each entry of some 
row or column of A by the cofactor of that entry. Thus det(A) is expressed 
in terms of n determinants of (n—1) x (n—1) matrices. These determinants 
are then evaluated in terms of determinants of (n — 2) x (n — 2) matrices, and 
so forth, until 2 x 2 matrices are obtained. The determinants of the 2 x 2 
matrices are then evaluated as in item 2. 
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Let us consider two examples of this technique in evaluating the determi- 
nant of the 4 x 4 matrix 


G24) ~G). -35 
1 24 =p 
PA Neri cog 4 
or a 


To evaluate the determinant of A by expanding along the fourth row, we 
must know the cofactors of each entry of that row. The cofactor of Aq; = 3 
is (-1)**! det(B), where 


1 1 5 
B= {1 -4 -1 
0 —3 1 


Let us evaluate this determinant by expanding along the first column. We 
have 


det(B) = (—1)'**(1) det & —) + (-1)?*1(1) det e : ) 
( 


= 10) (-4)() — (-1)(-8)] + (“QI @) — (5)(-3)] +0 
=—-7—-164+0= —23. 


Thus the cofactor of Aq, is (—1)°(—23) = 23. Similarly, the cofactors of A4g, 
Agg3, and Ag, are 8, 11, and —13, respectively. We can now evaluate the 
determinant of A by multiplying each entry of the fourth row by its cofactor; 
this gives 


det(A) = 3(23) + 6(8) + 1(11) + 2(—13) = 102. 


For the sake of comparison, let us also compute the determinant of A 
by expansion along the second column. The reader should verify that the 
cofactors of Aj2, Ag2, and Ay are —14, 40, and 8, respectively. Thus 


1-4 =1 Bi - Ws 
det(A) = (—1)'*7(1) det [2 -3 1] +(-1)?*?(1)det|2 -3 1 
Be. thy 42 a. vase 

oc Gk os oy Ae 5 

+ (—1)3+7(0)det {1-4 -—1]+(—1)4*7(6)det [1 -4 -1 

S. ik. 2 2 —3 1 


= 14+ 40+0+4 48 = 102. 
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Of course, the fact that the value 102 is obtained again is no surprise since the 
value of the determinant of A is independent of the choice of row or column 
used in the expansion. 

Observe that the computation of det(A) is easier when expanded along 
the second column than when expanded along the fourth row. The difference 
is the presence of a zero in the second column, which makes it unnecessary 
to evaluate one of the cofactors (the cofactor of Ago). For this reason, it is 
beneficial to evaluate the determinant of a matrix by expanding along a row or 
column of the matrix that contains the largest number of zero entries. In fact, 
it is often helpful to introduce zeros into the matrix by means of elementary 
row operations before computing the determinant. This technique utilizes 
the first three properties of the determinant. 


Properties of the Determinant 


1. If B is a matrix obtained by interchanging any two rows or interchanging 
any two columns of an n x n matrix A, then det(B) = — det(A). 


2. If B is a matrix obtained by multiplying each entry of some row or 
column of an n x n matrix A by a scalar k, then det(B) = k- det(A). 


3. If B is a matrix obtained from an n x n matrix A by adding a multiple 


of row i to row j or a multiple of column i to column j for i 4 j, then 
det(B) = det(A). 


As an example of the use of these three properties in evaluating deter- 
minants, let us compute the determinant of the 4 x 4 matrix A considered 
previously. Our procedure is to introduce zeros into the second column of 
A by employing property 3, and then to expand along that column. (The 
elementary row operations used here consist of adding multiples of row 1 to 
rows 2 and 4.) This procedure yields 


2 A ag > a 5 
11-4 -1 -1 0 -5 -6 
det(A) = det S op: <8 1 = det 20 —3 1 
§6;- 2 -9 0 -5 -28 

-1 -5 -6 

= 1(-1)'t? det | 2 -3 1 

-9 -5 —28 


The resulting determinant of a 3 x 3 matrix can be evaluated in the same 
manner: Use type 3 elementary row operations to introduce two zeros into 
the first column, and then expand along that column. This results in the 
value —102. Therefore 


det(A) = 1(—1)'*?(—102) = 102. 
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The reader should compare this calculation of det(A) with the preceding 
ones to see how much less work is required when properties 1, 2, and 3 are 
employed. 

In the chapters that follow, we often have to evaluate the determinant of 
matrices having special forms. The next two properties of the determinant 
are useful in this regard: 


4. The determinant of an upper triangular matrix is the product of its 
diagonal entries. In particular, det(Z) = 1. 

5. If two rows (or columns) of a matrix are identical, then the determinant 
of the matrix is zero. 


As an illustration of property 4, notice that 
-3 1 2 
det 0 4 5] = (-3)(4)(—6) = 72. 
0 0 -6 


Property 4 provides an efficient method for evaluating the determinant of a 
matrix: 


(a) Use Gaussian elimination and properties 1, 2, and 3 above to reduce the 
matrix to an upper triangular matrix. 


(b) Compute the product of the diagonal entries. 


For instance, 


a, ee ee 2 os a 
eh, a, a a ae 
det} _4 5 -19 -6|~%t]o 4 -2 -2 
ae ee ee 0 1 4 -4 
i ee ae (ag. 2 & 
ee ees 0 1 25 2 
Seg “Ge age Oy. as 
0 0 9 -6 0 0 0 6 
ee ee 


The next three properties of the determinant are used frequently in later 
chapters. Indeed, perhaps the most significant property of the determinant 
is that it provides a simple characterization of invertible matrices. (See prop- 
erty 7.) 


6. For any n x n matrices A and B, det(AB) = det(A)- det(B). 
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7. Ann xn matrix A is invertible if and only if det(A) 4 0. Furthermore, 


1 
if A is invertible, then det(A~') = det(Ay’ 


8. For any n x n matrix A, the determinants of A and A? are equal. 


For example, property 7 guarantees that the matrix A on page 233 is 
invertible because det(A) = 102. 

The final property, stated as Exercise 15 of Section 4.3, is used in Chap- 
ter 5. It is a simple consequence of properties 6 and 7. 


9. If A and B are similar matrices, then det(A) = det(B). 


EXERCISES 


1. Label the following statements as true or false. 


(a) The determinant of a square matrix may be computed by expand- 
ing the matrix along any row or column. 

(b) In evaluating the determinant of a matrix, it is wise to expand 
along a row or column containing the largest number of zero en- 
tries. 

(c) If two rows or columns of A are identical, then det(A) = 0. 

(d) If Bisa matrix obtained by interchanging two rows or two columns 
of A, then det(B) = det(A). 

(e) If B is a matrix obtained by multiplying each entry of some row 
or column of A by a scalar, then det(B) = det(A). 

(f) If Bis a matrix obtained from A by adding a multiple of some row 
to a different row, then det(B) = det(A). 

(g) The determinant of an upper triangular n xn matrix is the product 
of its diagonal entries. 

(h) For every A € Mnxn(F), det(A‘) = — det(A). 

(i) If A, BE Mnxn(F), then det(AB) = det(A)- det(B). 

(j) If Q is an invertible matrix, then det(Q~!) = [det(Q)|~?. 

(k) A matrix Q is invertible if and only if det(Q) 4 0. 


2. Evaluate the determinant of the following 2 x 2 matrices. 
4-5 -1 7 
@ (3 73) (3 3) 
2+% -1+32 3. 4 
ke) ¢ = Baa ) (4) ee 


3. Evaluate the determinant of the following matrices in the manner indi- 
cated. 
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0 1 2 1 0 2 
-1 oO -8 0 1 5 
(a) 2 3 0 ce) -1 3 0 
along the first row along the first column 
0 1 2 1 0 2 
-1 OO -8 0 1 5 
(c) 38 0 id) -1 3 0 
along the second column along the third row 
O 1+%t 2 a 2+7 0 
—2i 0 1l-i —1 3 2i 
(130 4 0 a Cees ee 
along the third row along the third column 
0 2 1 8 1-1 2 -1 
1 0 -2 2 -3 4 1 -1 
(g) 3 -1 O 1 (h) 2 -5 -3 8 
-1 1 2 O —2 6 -4 1 
along the fourth column along the fourth row 


4. Evaluate the determinant of the following matrices by any legitimate 


method. 
1 2 3 —1 3 2 
(a) |4 5 6 (b) 4 -—8 1 
7 8 9 2 2 5 
0 1 1 1 -2 3 
(c) | 1 2 —5 (d) | -1 2 —5 
—4 3 —1 2 


a a Si Bae SB 
(e){ 3 14% 2 (f)[1-i i 1 


1 0-2 8 t =2 3 S19 

Go pt a. 25. (Lota ig 
(| o 44 1 (h) |} 9 99 99 31 
i tg Ge a SAS +B ia) AG 


5. Suppose that M € M,xn(F) can be written in the form 


A B 
ia Cn 


where A is a square matrix. Prove that det(M) = det(A). 
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6.1 Prove that if M © Myjyn(F) can be written in the form 


A B 
u=(6 ¢): 


where A and C are square matrices, then det(M) = det(A)- det(C). 


4.5* A CHARACTERIZATION OF THE DETERMINANT 


In Sections 4.2 and 4.3, we showed that the determinant possesses a number of 
properties. In this section, we show that three of these properties completely 
characterize the determinant; that is, the only function 6: Mnxn(F) > F 
having these three properties is the determinant. This characterization of 
the determinant is the one used in Section 4.1 to establish the relationship 


between det : and the area of the parallelogram determined by u and 


v. The first of these properties that characterize the determinant is the one 
described in Theorem 4.3 (p. 212). 


Definition. A function 06: Myyn(F) — F is called an n-linear function 
if it is a linear function of each row of an n x n matrix when the remaining 


n—1 rows are held fixed, that is, 6 is n-linear if, for every r= 1,2,...,n, we 
have 
ay ay ay 
Qr—1 Qr—1 Gr—1 
dJutkv} =do] u + ké v 
Gr+1 Gr+1 Qr+1 
an an an 


whenever k is a scalar and u, v, and each a; are vectors in F”. 


Example 1 
The function 6: Mnxn(£) — F defined by 6(A) = 0 for each A € Mnxn(F) 


is an n-linear function. 


Example 2 


For 1 < j < nN, define 65: Mnxn(F) — —F by 6;(A) = Ay; Ao; oe -An; for each 
A€Mnaxn(F); that is, 6;(A) equals the product of the entries of column j of 
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A. Let AE Mnxn(F), ay = (Ai, Aja, wr ats , Ain); and v = (by, ba, preg , On) € F”. 
Then each 6; is an n-linear function because, for any scalar k, we have 


5] ar + kv | = Arg- Ar—ayj(Arg + by) Ace tayg Ang 


= Aig: A(r-1yg Arg Arty + Ang 
+ Aig: ++ A(r—1yj (hy) A(r4ayg ++ Ang 
= Aig: A(r-1yg Arg Arty °° Ang 
+ k(Atg +» Awe—1y3bj Atay» * Ang) 


ay ay 
Qr—1 Gr—1 
=d] a, | +kd] v ¢ 
Gr41 Gr41 
an an 


Example 3 


The function 6: Maxn(F) — F defined for each A € Myyn(F) by 6(A) = 
Aj1A22°+: Ann (i.e., 6(A) equals the product of the diagonal entries of A) is 
an n-linear function. 


Example 4 


The function 6: Mnxn(R) — R defined for each A € Mnxn(R) by 6(A) = 
tr(A) is not an n-linear function for n > 2. For if I is the n x n identity 
matrix and A is the matrix obtained by multiplying the first row of I by 2, 
then 6((A)=n+142n=2-6(1). 


Theorem 4.3 (p. 212) asserts that the determinant is an n-linear function. 
For our purposes this is the most important example of an n-linear function. 
Now we introduce the second of the properties used in the characterization 
of the determinant. 


Definition. An n-linear function 6: Mnxn(F') > F is called alternating 
if, for each A € Mnxn(F), we have 6(A) = 0 whenever two adjacent rows of 
A are identical. 
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Theorem 4.10. Let 6: Mayn(F) — F be an alternating n-linear function. 
(a) If A € Mnxn(F) and B is a matrix obtained from A by interchanging 
any two rows of A, then 6(B) = —6d(A). 
(b) If A€ Mayn(£) has two identical rows, then 6(A) = 0. 


Proof. (a) Let A € Mnxn(F), and let B be the matrix obtained from A 
by interchanging rows r and s, where r < s. We first establish the result in 
the case that s = r+1. Because 6: Mny»(f) > F is an n-linear function 
that is alternating, we have 


ay a1 ay 
0=6 Gr Tr Ar41 es ar 4 5 Qr+1 

Gr T Ar41 Ap + Qr+1 Ap + Gr+1 
an an an 

ay ay ay ay 

a a Gr+1 Gr+1 

=o] ")4+6] "7 [+d] [+5] 7% 
ar Ar+1 ar Ar+1 
an an an an 


= 04+6(A) +6(B) +0. 


Thus 6(B) = —6(A). 

Next suppose that s > r+ 1, and let the rows of A be aj, a2,...,Qn. 
Beginning with a, and a,41, successively interchange a, with the row that 
follows it until the rows are in the sequence 


1, 42,+-+54r—1,4r41,-++, As, 4r,Ast+1,-++,4n- 


In all, s—r interchanges of adjacent rows are needed to produce this sequence. 
Then successively interchange a, with the row that precedes it until the rows 
are in the order 


G1, 42,+++,Gr—1, 4s, Ap41,+-+5As—1, Ur, As41,+++5A4n- 


This process requires an additional s — r — 1 interchanges of adjacent rows 
and produces the matrix B. It follows from the preceding paragraph that 


§(B) = (10-9465) = —6(A). 


(b) Suppose that rows r and s of A € Myxn(F) are identical, where r < s. 
If s =r+1, then 6(A) = 0 because 6 is alternating and two adjacent rows 
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of A are identical. If s > r +1, let B be the matrix obtained from A by 
interchanging rows r +1 and s. Then 6(B) = 0 because two adjacent rows of 
B are identical. But 6(B) = —6(A) by (a). Hence 6(A) = 0. | 


Corollary 1. Let 6: Mnxn(f) > F be an alternating n-linear function. 
If B is a matrix obtained from A € Myyxn(F) by adding a multiple of some 
row of A to another row, then 6(B) = 6(A). 


Proof. Let B be obtained from A € Myxn(F) by adding k times row i of 
A to row j, where j #7, and let C' be obtained from A by replacing row j of 
A by row 7 of A. Then the rows of A, B, and C are identical except for row 
j. Moreover, row j of B is the sum of row j of A and k times row j of C. 
Since 6 is an n-linear function and C' has two identical rows, it follows that 


5(B) = 5(A) + k6(C) = 6(A) + k-0 = 5(A). i 


The next result now follows as in the proof of the corollary to Theorem 4.6 
(p. 216). (See Exercise 11.) 


Corollary 2. Let 6: Mnxn(F) — F be an alternating n-linear function. 
If M € Myxn(F) has rank less than n, then 6(M) = 0. 


Proof. Exercise. i 


Corollary 3. Let 6: Mnxn(f) > F be an alternating n-linear function, 
and let Ey, E2, and E3 in Myxn(F) be elementary matrices of types 1, 2, 
and 3, respectively. Suppose that E2 is obtained by multiplying some row 
of I by the nonzero scalar k. Then 6(£,) = —d(1), 6(E2) = k-d(1), and 
6(E3) = d(1). 


Proof. Exercise. | 


We wish to show that under certain circumstances, the only alternating 
n-linear function 6: Mnxn(F’) > F is the determinant, that is, 5(A) = det(A) 
for all A € Myxn(F'). In view of Corollary 3 to Theorem 4.10 and the facts 
on page 223 about the determinant of an elementary matrix, this can happen 
only if 6(1) = 1. Hence the third condition that is used in the characterization 
of the determinant is that the determinant of the n x n identity matrix is 1. 
Before we can establish the desired characterization of the determinant, we 
must first show that an alternating n-linear function 6 such that 6(1) = 1 is 
a multiplicative function. The proof of this result is identical to the proof of 
Theorem 4.7 (p. 223), and so it is omitted. (See Exercise 12.) 


Theorem 4.11. Let 6: Maxn(F) — F be an alternating n-linear function 
such that 6([) =1. For any A, BE Mnxn(F), we have 6(AB) = 6(A)-6(B). 
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Theorem 4.12. If 6: Maxn(F) — F is an alternating n-linear function 
such that 6(I) = 1, then 6(A) = det(A) for every A € Myxn(F). 


Proof. Let 6: Mnxn(F) > F be an alternating n-linear function such that 
6(Z) = 1, and let A € Mnxn(F). If A has rank less than n, then by Corollary 2 
to Theorem 4.10, 5(A) = 0. Since the corollary to Theorem 4.6 (p. 217) gives 
det(A) = 0, we have 6(A) = det(A) in this case. If, on the other hand, A has 
rank n, then A is invertible and hence is the product of elementary matrices 
(Corollary 3 to Theorem 3.6 p. 159), say A = E,,-+- EF gE. Since 6(1) = 1, 
it follows from Corollary 3 to Theorem 4.10 and the facts on page 223 that 
6(E) = det(E) for every elementary matrix E. Hence by Theorems 4.11 
and 4.7 (p. 223), we have 


6(A) = 6(Em-++: Eo E1) 
= 6(Em)+ +++ +6(E2)+d(E1) 
= det(Ey,)-+ +++ + det(E2)- det (£4) 
= det (Ey - +» Ea E1) 
= det(A). | 


Theorem 4.12 provides the desired characterization of the determinant: It 
is the unique function 6: Maxn(F) > F that is n-linear, is alternating, and 
has the property that 6(I) = 1. 


EXERCISES 


1. Label the following statements as true or false. 


(a) Any n-linear function 6: Mnxn(f) > F is a linear transformation. 

(b) Any n-linear function 6: Mnxn(F) — F is a linear function of each 
row of an n x n matrix when the other n — 1 rows are held fixed. 

(c) If 6: Maxn(F) — F is an alternating n-linear function and the 
matrix A € Myx»(£) has two identical rows, then 6(A) = 0. 

(d) If 6: Maxn(F) — F is an alternating n-linear function and B is 
obtained from A € M,x.(F) by interchanging two rows of A, then 
6(B) = 6(A). 

(e) There is a unique alternating n-linear function 6: Maxn(F) > F. 

(f) The function 6: Mnxn(F) — F defined by 6(A) = 0 for every 
A€Mpyxn(F) is an alternating n-linear function. 


2. Determine all the 1-linear functions 6: Miyx1(F) > F. 


Determine which of the functions 6: Msy3(F) — F in Exercises 3-10 are 
3-linear functions. Justify each answer. 


S 


oO 


Cc 


10. 
11. 
12. 
13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


eect on get ek eevee 


. 4.5 A Characterization of the Determinant 243 
6(A) =k, where k is any nonzero scalar 
6(A) = Ago 
(A) = A11A23A32 
6(A) = Aji + Ag3 + A32 
6(A) = Aji Aoi Azo 
6(A) = Aj) A31 Aze 
6(A) = Aj) Ady ABs 
(A) = Ai1A22A33 — Ait Aoi Azo 


Prove Corollaries 2 and 3 of Theorem 4.10. 
Prove Theorem 4.11. 


Prove that det: Mox2(F) > F is a 2-linear function of the columns of 
a matrix. 


Let a,b,c,d € F. Prove that the function 6: Mox2(F’) > F defined by 
6(A) = Ay1Ag2a + Ait Ag1b + Aig A22c+ Aj2Aaid is a 2-linear function. 


Prove that 6: Mox2(F’) — F is a 2-linear function if and only if it has 
the form 


6(A) = Ay; Agoa + Ay1Ao1b + Ajy2Ag2c + AjpAaid 
for some scalars a,b,c,d € F. 


Prove that if 6: Maxn»(F’) — F is an alternating n-linear function, then 
there exists a scalar k such that 6(A) = kdet(A) for all A € Mnxn(F). 


Prove that a linear combination of two n-linear functions is an n-linear 
function, where the sum and scalar product of n-linear functions are as 
defined in Example 3 of Section 1.2 (p. 9). 


Prove that the set of all n-linear functions over a field F' is a vector 
space over F' under the operations of function addition and scalar mul- 
tiplication as defined in Example 3 of Section 1.2 (p. 9). 


Let 6: Mnxn(F) — F be an n-linear function and F a field that does 
not have characteristic two. Prove that if 6(B) = —d(A) whenever B is 
obtained from A € Mnx»() by interchanging any two rows of A, then 
6(M) = 0 whenever M € Myx,(F’) has two identical rows. 


Give an example to show that the implication in Exercise 19 need not 
hold if F has characteristic two. 
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Diagonalization 


5.1 Eigenvalues and Eigenvectors 

5.2 Diagonalizability 

5.3* Matrix Limits and Markov Chains 

5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 


This chapter is concerned with the so-called diagonalization problem. For 
a given linear operator T on a finite-dimensional vector space V, we seek 
answers to the following questions. 


1. Does there exist an ordered basis @ for V such that [T]g is a diagonal 
matrix? 
2. If such a basis exists, how can it be found? 


Since computations involving diagonal matrices are simple, an affirmative 
answer to question 1 leads us to a clearer understanding of how the operator T 
acts on V, and an answer to question 2 enables us to obtain easy solutions to 
many practical problems that can be formulated in a linear algebra context. 
We consider some of these problems and their solutions in this chapter; see, 
for example, Section 5.3. 

A solution to the diagonalization problem leads naturally to the concepts 
of eigenvalue and eigenvector. Aside from the important role that these 
concepts play in the diagonalization problem, they also prove to be useful 
tools in the study of many nondiagonalizable operators, as we will see in 
Chapter 7. 


5.1 EIGENVALUES AND EIGENVECTORS 


In Example 3 of Section 2.5, we were able to obtain a formula for the 
reflection of R? about the line y = 27. The key to our success was to find a 
basis 9’ for which [T], is a diagonal matrix. We now introduce the name for 
an operator or matrix that has such a basis. 


Definitions. A linear operator T on a finite-dimensional vector space V 
is called diagonalizable if there is an ordered basis 3 for V such that [T]g 
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is a diagonal matrix. A square matrix A is called diagonalizable if L,4 is 
diagonalizable. 


We want to determine when a linear operator T on a finite-dimensional 
vector space V is diagonalizable and, if so, how to obtain an ordered basis 
GB = {v1,v2,...,Un} for V such that [T]g is a diagonal matrix. Note that, if 
D =[T]g is a diagonal matrix, then for each vector v; € 6, we have 


T(vs) = Do Digvs = Djyug = Ajry, 
4=1, 


where Xj = Digg 


Conversely, if 6 = {v1,v2,...,Un} is an ordered basis for V such that 
T(v;) = Aj;v; for some scalars A), A2,..., An, then clearly 
Me UO. Seah 
O°. de oa2 30 
Te = ae ; 
Ge AO cena Bee 


In the preceding paragraph, each vector v in the basis (@ satisfies the 
condition that T(v) = Au for some scalar A. Moreover, because v lies in a 
basis, v is nonzero. These computations motivate the following definitions. 


Definitions. Let T be a linear operator on a vector space V. A nonzero 
vector v € V is called an eigenvector of T if there exists a scalar \ such 
that T(v) = Av. The scalar X is called the eigenvalue corresponding to the 
eigenvector v. 

Let A be in Mnxn(F’). A nonzero vector v € F” is called an eigenvector 
of A if v is an eigenvector of L4; that is, if Av = Xv for some scalar »._ The 
scalar \ is called the eigenvalue of A corresponding to the eigenvector v. 


The words characteristic vector and proper vector are also used in place of 
eigenvector. The corresponding terms for eigenvalue are characteristic value 
and proper value. 

Note that a vector is an eigenvector of a matrix A if and only if it is an 
eigenvector of L4. Likewise, a scalar \ is an eigenvalue of A if and only if it is 
an eigenvalue of L4. Using the terminology of eigenvectors and eigenvalues, 
we can summarize the preceding discussion as follows. 


Theorem 5.1. A linear operator T on a finite-dimensional vector space V 
is diagonalizable if and only if there exists an ordered basis 3 for V consisting 
of eigenvectors of T. Furthermore, if T is diagonalizable, 3 = {v1,v2,...,Un} 
is an ordered basis of eigenvectors of T, and D = [T]g, then D is a diagonal 
matrix and D,,; is the eigenvalue corresponding to v; for1 <j <n. 
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To diagonalize a matrix or a linear operator is to find a basis of eigenvec- 
tors and the corresponding eigenvalues. 

Before continuing our study of the diagonalization problem, we consider 
three examples of eigenvalues and eigenvectors. 
Example 1 


Let 


Since 


(02) CB)=-2( 4) 


v1, is an eigenvector of Ly, and hence of A. Here \; = —2 is the eigenvalue 
corresponding to v;. Furthermore, 


rae) = (4 2) (4) = (3a) =5(3) 5 


and so v2 is an eigenvector of Ly, and hence of A, with the corresponding 
eigenvalue Az = 5. Note that 3 = {v1, v2} is an ordered basis for R? consisting 
of eigenvectors of both A and Ly, and therefore A and Ly are diagonalizable. 
Moreover, by Theorem 5.1, 


Lale= (“5 a + 


Example 2 


Let T be the linear operator on R? that rotates each vector in the plane 
through an angle of 7/2. It is clear geometrically that for any nonzero vector 
v, the vectors v and T(v) are not collinear; hence T(v) is not a multiple of 
v. Therefore T has no eigenvectors and, consequently, no eigenvalues. Thus 
there exist operators (and matrices) with no eigenvalues or eigenvectors. Of 
course, such operators and matrices are not diagonalizable. 


Example 3 


Let C°°(R) denote the set of all functions f: R > R having derivatives of all 
orders. (Thus C%(R) includes the polynomial functions, the sine and cosine 
functions, the exponential functions, etc.) Clearly, C°(R) is a subspace of 
the vector space F(R, R) of all functions from R to R as defined in Section 
1.2. Let T: C°(R) — C*°(R) be the function defined by T(f) = f’, the 
derivative of f. It is easily verified that T is a linear operator on C°(R). We 
determine the eigenvalues and eigenvectors of T. 
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Suppose that f is an eigenvector of T with corresponding eigenvalue 4. 
Then f’ = T(f) =Af. This is a first-order differential equation whose solu- 
tions are of the form f(t) = ce*’ for some constant c. Consequently, every 
real number 4 is an eigenvalue of T, and A corresponds to eigenvectors of the 
form ce for c £ 0. Note that for \ = 0, the eigenvectors are the nonzero 
constant functions. 


In order to obtain a basis of eigenvectors for a matrix (or a linear opera- 
tor), we need to be able to determine its eigenvalues and eigenvectors. The 
following theorem gives us a method for computing eigenvalues. 


Theorem 5.2. Let A€ Myy,(F). Then a scalar » is an eigenvalue of A 
if and only if det(A — AI,,) = 0. 


Proof. A scalar 4 is an eigenvalue of A if and only if there exists a nonzero 
vector v € F” such that Av = Xv, that is, (A — AI,)(v) = 0. By Theorem 2.5 
(p. 71), this is true if and only if A — XI, is not invertible. However, this 
result is equivalent to the statement that det(A — XI,,) = 0. | 


Definition. Let A © Myxn(F). The polynomial f(t) = det(A — tI,) is 
called the characteristic polynomial ! of A. 


Theorem 5.2 states that the eigenvalues of a matrix are the zeros of its 
characteristic polynomial. When determining the eigenvalues of a matrix or 
a linear operator, we normally compute its characteristic polynomial, as in 


the next example. 


Example 4 


To find the eigenvalues of 


1 1 
A= ¢ - e Mex2(R), 


we compute its characteristic polynomial: 


det(A — tI) = det ice is) = —2t-3= (t-3)(t+1). 
It follows from Theorem 5.2 that the only eigenvalues of A are 3 and —1. 
¢ 


1The observant reader may have noticed that the entries of the matrix A — tl, 
are not scalars in the field F’. They are, however, scalars in another field F(t), the 
field of quotients of polynomials in t with coefficients from F’. Consequently, any 
results proved about determinants in Chapter 4 remain valid in this context. 
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It is easily shown that similar matrices have the same characteristic poly- 
nomial (see Exercise 12). This fact enables us to define the characteristic 
polynomial of a linear operator as follows. 


Definition. Let T be a linear operator on an n-dimensional vector space 
V with ordered basis 3. We define the characteristic polynomial f(t) of 
T to be the characteristic polynomial of A = [T]g. That is, 


f(t) =det(A—tl,). 


The remark preceding this definition shows that the definition is indepen- 
dent of the choice of ordered basis 3. Thus if T is a linear operator on a 
finite-dimensional vector space V and @ is an ordered basis for V, then is 
an eigenvalue of T if and only if \ is an eigenvalue of [T]s. We often denote 
the characteristic polynomial of an operator T by det(T — tJ). 


Example 5 


Let T be the linear operator on P2(R) defined by T(f(x)) = f(a)+(a+1) f’(x), 
let @ be the standard ordered basis for P2(R), and let A = [T]g. Then 


1 1 0 
A=|0 2 2 
0 0 3 
The characteristic polynomial of T is 
ae —t 
det(A — tI3) = det 2 
=(1-12)( —t) 


2 oe af 3). 
Hence X is an eigenvalue of T (or A) if and only if A=1,2,or3. 


Examples 4 and 5 suggest that the characteristic polynomial of an n x n 
matrix A is a polynomial of degree n. The next theorem tells us even more. 
It can be proved by a straightforward induction argument. 


Theorem 5.3. Let A € Mnxn(F). 
(a) The characteristic polynomial of A is a polynomial of degree n with 


leading coefficient (—1)”. 
(b) A has at most n distinct eigenvalues. 


Proof. Exercise. | 
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Theorem 5.2 enables us to determine all the eigenvalues of a matrix or 
a linear operator on a finite-dimensional vector space provided that we can 
compute the zeros of the characteristic polynomial. Our next result gives 
us a procedure for determining the eigenvectors corresponding to a given 
eigenvalue. 


Theorem 5.4. Let T be a linear operator on a vector space V, and let 


be an eigenvalue of T. A vector v € V is an eigenvector of T corresponding 
to \ if and only ifv £4 0 and v € N(T — Al). 


Proof. Exercise. i 


Example 6 


To find all the eigenvectors of the matrix 


ra) 


in Example 4, recall that A has two eigenvalues, 41 = 3 and Ap = —1. We 
begin by finding all the eigenvectors corresponding to A; = 3. Let 


Ld 3 0 —2 1 
By=A-a= (i - (4 at 4 => 
o= (Her 
r2 


is an eigenvector corresponding to A; = 3 if and only if 4 0 and x € N(Lz,); 
that is, c 4 0 and 


Cee. 


Clearly the set of all solutions to this equation is 


(0). 


Hence « is an eigenvector corresponding to A; = 3 if and only if 


Then 


r= t C) for some t # 0. 


Now suppose that x is an eigenvector of A corresponding to Ap = —1. Let 


ta -1 0 od 
By = A- dat = (4 iy-{ 0 aC aE 
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Then 
_ {71 
c= (=) Sc N(Lp,) 


if and only if x is a solution to the system 


224 +r & =0 


4x1 a 222 = 0. 
Hence 
1 
N(Lge,) = 4t _9 :tE Re. 
Thus x is an eigenvector corresponding to Ag = —1 if and only if 


c=t (3) for some t # 0. 


(@)(2)} 


is a basis for R? consisting of eigenvectors of A. Thus La, and hence A, is 
diagonalizable. 


Observe that 


Suppose that @ is a basis for F” consisting of eigenvectors of A. The 
corollary to Theorem 2.23 assures us that if Q is the n x n matrix whose 
columns are the vectors in 3, then Q~! AQ is a diagonal matrix. In Example 6, 
for instance, if 


then 


riue=()) 


Of course, the diagonal entries of this matrix are the eigenvalues of A that 
correspond to the respective columns of Q. 

To find the eigenvectors of a linear operator T on an n-dimensional vector 
space, select an ordered basis 6 for V and let A = [T]g. Figure 5.1 is the 
special case of Figure 2.2 in Section 2.4 in which V = W and @ = y. Recall 
that for v € V, ¢g(v) = [v]g, the coordinate vector of uv relative to G. We 
show that v € V is an eigenvector of T corresponding to 4 if and only if ¢g(v) 
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if oY 


tol |e 


Fr La Fr 


Figure 5.1 


is an eigenvector of A corresponding to A. Suppose that v is an eigenvector 
of T corresponding to A. Then T(v) = Av. Hence 


Aga(v) = Laga(v) = bT(v) = oa(Av) = Ada(v). 


Now ¢(v) 4 0, since ¢g is an isomorphism; hence ¢,(v) is an eigenvector 
of A. This argument is reversible, and so we can establish that if ¢g(v) 
is an eigenvector of A corresponding to », then v is an eigenvector of T 
corresponding to 4. (See Exercise 13.) 

An equivalent formulation of the result discussed in the preceding para- 
graph is that for an eigenvalue A of A (and hence of T), a vector y € F” is an 
eigenvector of A corresponding to A if and only if 3 (y) is an eigenvector of 
T corresponding to X. 

Thus we have reduced the problem of finding the eigenvectors of a linear 
operator on a finite-dimensional vector space to the problem of finding the 
eigenvectors of a matrix. The next example illustrates this procedure. 


Example 7 


Let T be the linear operator on P2(R) defined in Example 5, and let 3 be the 
standard ordered basis for P2(R). Recall that T has eigenvalues 1, 2, and 3 
and that 


11 0 
A=[T]g=[0 2 2 
00 3 


We consider each eigenvalue separately. 
Let A; = 1, and define 


0 1 0 
Bp=A-\I=]0 1 2 
0 0 2 


Then 
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is an eigenvector corresponding to A, = 1 if and only if 4 0 and x € N(Lz,); 
that is, x is a nonzero solution to the system 


v2 =0 
av + 2x3 =0 
2x3 = 0. 


Notice that this system has three unknowns, 21, £2, and x3, but one of these, 
1, does not actually appear in the system. Since the values of x, do not 
affect the system, we assign x, a parametric value, say x, = a, and solve the 
system for x2 and 23. Clearly, 72 = x3 = 0, and so the eigenvectors of A 
corresponding to A; = 1 are of the form 


1 
a|{O]| =ae, 
0 


for a £ 0. Consequently, the eigenvectors of T corresponding to A; = 1 are 
of the form 


$3 '(ae1) = ag '(e1) =a-l=a 


for any a £ 0. Hence the nonzero constant polynomials are the eigenvectors 
of T corresponding to A; = 1. 


Next let Ap = 2, and define 
-1 1 0 
By = A we Aol = 0 0 2 
0 0 1 
It is easily verified that 
1 
N(Le,)=<al{l]:aEeR>, 
0 
and hence the eigenvectors of T corresponding to Az = 2 are of the form 
1 


o3° all = agg '(e1 + e2) = a(1+ 2) 
0 


fora #0. 
Finally, consider A3 = 3 and 


By=A—)3F = 


oonw 
| 

Corr 
on oO 
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Since 
1 
N(Le,)=<al2]:aER>, 
1 
the eigenvectors of T corresponding to A3 = 3 are of the form 
1 
og a|2 = ag *(e1 + 2e2 + €3) = a(1 + 22 + 2”) 
1 


fora #0. 


For each eigenvalue, select the corresponding eigenvector with a = 1 in the 
preceding descriptions to obtain y = {1,1+2,1+22+.7}, which is an ordered 
basis for P2(R) consisting of eigenvectors of T. Thus T is diagonalizable, and 


100 
[T,=]0 2 0]. 
00 3 


We close this section with a geometric description of how a linear operator 
T acts on an eigenvector in the context of a vector space V over R. Let v be 
an eigenvector of T and A be the corresponding eigenvalue. We can think of 
W = span({v}), the one-dimensional subspace of V spanned by v, as a line 
in V that passes through 0 and v. For any w € W, w = cv for some scalar c, 
and hence 


T(w) = T(cv) = cT(v) = crv = Au; 


so T acts on the vectors in W by multiplying each such vector by A. There 
are several possible ways for T to act on the vectors in W, depending on the 
value of X. We consider several cases. (See Figure 5.2.) 


CASE 1. If A > 1, then T moves vectors in W farther from 0 by a factor 
of A. 


CASE 2. If \ = 1, then T acts as the identity operator on W. 


CASE 3. If 0 < A < 1, then T moves vectors in W closer to 0 by a factor 
of X. 


CASE 4. If \ = 0, then T acts as the zero transformation on W. 


CASE 5. If A < 0, then T reverses the orientation of W; that is, T moves 
vectors in W from one side of 0 to the other. 


Sec. 5.1 Eigenvalues and Eigenvectors 255 


T(y) Case 1: A> 1 


Case 2: A= 1 


Case 3:0<A<1 


Case 4: \ = 0 


Case 5: A <0 


Figure 5.2: The action of T on W = span({x}) when z is an eigenvector of T. 


To illustrate these ideas, we consider the linear operators in Examples 3, 
4, and 2 of Section 2.1. 

For the operator T on R? defined by T(a1,a2) = (a1, —a2), the reflection 
about the z-axis, e; and eg are eigenvectors of T with corresponding eigen- 
values 1 and —1, respectively. Since e, and eg span the z-axis and the y-axis, 
respectively, T acts as the identity on the x-axis and reverses the orientation 
of the y-axis. 

For the operator T on R? defined by T(a1, a2) = (a1, 0), the projection on 
the a-axis, e, and eg are eigenvectors of T with corresponding eigenvalues 1 
and 0, respectively. Thus, T acts as the identity on the x-axis and as the zero 
operator on the y-axis. 

Finally, we generalize Example 2 of this section by considering the oper- 
ator that rotates the plane through the angle 6, which is defined by 


To(a1, a2) = (a1 cos 6 — ag sin 6, a; sin 8 + a2 cos 8). 


Suppose that 0 < 6 < a. Then for any nonzero vector v, the vectors v and 
To(v) are not collinear, and hence Tg maps no one-dimensional subspace of 
R? into itself. But this implies that Tg has no eigenvectors and therefore 
no eigenvalues. To confirm this conclusion, we note that the characteristic 
polynomial of Tg is 


cos9—t —sin@ 


detloc at) cet ( sin@  cos@—t 


) = — (2cos6)t +1, 
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which has no real zeros because, for 0 < 6 < 7, the discriminant 4 cos? 6 — 4 
is negative. 


EXERCISES 


1. Label the following statements as true or false. 


(a) 
(b) 
(c) 
(d) 
(e) 
(f) 
(g) 
(h) 
(i) 
(3) 
(k) 


Every linear operator on an n-dimensional vector space has n dis- 
tinct eigenvalues. 

If a real matrix has one eigenvector, then it has an infinite number 
of eigenvectors. 

There exists a square matrix with no eigenvectors. 

Eigenvalues must be nonzero scalars. 

Any two eigenvectors are linearly independent. 

The sum of two eigenvalues of a linear operator T is also an eigen- 
value of T. 

Linear operators on infinite-dimensional vector spaces never have 
eigenvalues. 

An n xX n matrix A with entries from a field F’ is similar to a 
diagonal matrix if and only if there is a basis for F” consisting of 
eigenvectors of A. 

Similar matrices always have the same eigenvalues. 

Similar matrices always have the same eigenvectors. 

The sum of two eigenvectors of an operator T is always an eigen- 
vector of T. 


2. For each of the following linear operators T on a vector space V and 
ordered bases 3, compute [T]g, and determine whether £ is a basis 
consisting of eigenvectors of T. 


(a) 
(b) 


(c) 


v=ne.1(6) = (1907 i) ma e= {(2).(2)} 


V = P,(R), T(a + br) = (6a — 6b) + (12a — 11b)a, and 
B= {34+ 42,2432} 


a 3a + 2b — 2c 
V=R°, Tb] = | -4a—3b+ 2c], and 
c —¢ 
0 1 1 
B= 1 ’ —-1 ’ 0 
1 0 2 
V = P2(R), T(a+ br + cx?) = 
(—4a + 2b — 2c) — (7a + 3b + 7e)z + (7a + b+ 5e)z’, 
and @ = {x — 27,-1+ 27,-1-2+27} 
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(e) V=P3(R), T(at+ bx + cx? + dx) = 


d+(-—c+d)z+(a+b—2c)x? + (-b+c—2d)z°, 


and 8 ={1-2+2°,1+a?,le+27} 
a b —T7a—4b+4c—4d 6b 
(f) V=Maxa(R), T (2 a eer cea i) 94 


p={(i (a a) 0) Co a} 


3. For each of the following matrices A € Mnxn(F), 


(i) Determine all the eigenvalues of A. 


(ii) For each eigenvalue of A, find the set of eigenvectors correspond- 
ing to A. 


(iii) If possible, find a basis for F” consisting of eigenvectors of A. 


(iv) If successful in finding such a basis, determine an invertible matrix 
Q and a diagonal matrix D such that Q-!AQ = D. 


1 2 
(a) A=(j ) for F=R 
Q -2 -3 
(b) A=|-1 1 -1| forF=R 
2 2 5 
a 1 
(c) A=(5 3) for F=C 
2 0 -1 
(d) A=(4 1 -4] forF=R 
2 0 -1 


4. For each linear operator T on V, find the eigenvalues of T and an ordered 
basis @ for V such that [T]g is a diagonal matrix. 


(a) V=R? and T(a,b) = (—2a + 3b, —10a + 9b) 

(b) V=R?® and T(a,b,c) = ee tb eta 3b + 8c, —2a + b — 2c) 

(c) V= . and T(a, b,c) = (—4a+ 3b—6c, 6a —7b+ 12c, 6a — 664+ 11c) 
b 


(d) V=P,(R) and Than + b) = (—6a + 2b) + (—6a + b) 
(e) V= = P.(R) and T(f(x)) = xf"(a) + f(2)a + f(3) 
(f) V = Ps(R) and T(f(#)) = fs: )+ fQ)x 

(g) V=Ps(R) and T(f(#)) = af"(x) + f"(@) — fF) 

(h) V = Mox2(R) and T 6 ‘) = (: : 
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(i) V = Max2(R) and T(¢ SG ) 
(j) V = Mox2(R) and T(A) = At +2-tr(A)- Ip 


Prove Theorem 5.4. 


Let T be a linear operator on a finite-dimensional vector space V, and 
let 6 be an ordered basis for V. Prove that A is an eigenvalue of T if 
and only if is an eigenvalue of [T],. 


Let T be a linear operator on a finite-dimensional vector space V. We 
define the determinant of T, denoted det(T), as follows: Choose any 
ordered basis @ for V, and define det(T) = det([T],). 


(a) Prove that the preceding definition is independent of the choice 
of an ordered basis for V. That is, prove that if @ and y are two 
ordered bases for V, then det([T]) = det([T],). 

(b) Prove that T is invertible if and only if det(T) 4 0. 

(c) Prove that if T is invertible, then det(T~+) = [det(T)]~?. 

(d) Prove that if U is also a linear operator on V, then det(TU) = 
det(T)- det(U). 

(e) Prove that det(T — Aly) = det([T]g — AZ) for any scalar and any 
ordered basis ( for V. 


(a) Prove that a linear operator T on a finite-dimensional vector space 
is invertible if and only if zero is not an eigenvalue of T. 

(b) Let T be an invertible linear operator. Prove that a scalar \ is an 
eigenvalue of T if and only if \~! is an eigenvalue of T~!. 

(c) State and prove results analogous to (a) and (b) for matrices. 


Prove that the eigenvalues of an upper triangular matrix M are the 
diagonal entries of M. 


Let V be a finite-dimensional vector space, and let be any scalar. 


(a) For any ordered basis (3 for V, prove that [Aly]¢ = AI. 
(b) Compute the characteristic polynomial of Aly. 
(c) Show that Aly is diagonalizable and has only one eigenvalue. 


A scalar matrix is a square matrix of the form AJ for some scalar ); 
that is, a scalar matrix is a diagonal matrix in which all the diagonal 
entries are equal. 


(a) Prove that if a square matrix A is similar to a scalar matrix AJ, 
then A= Al. 

(b) Show that a diagonalizable matrix having only one eigenvalue is a 
scalar matrix. 
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12. 


13. 


(c) 
(a) 
(b) 


Prove that ( i) is not diagonalizable. 


1 

0 1 
Prove that similar matrices have the same characteristic polyno- 
mial. 

Show that the definition of the characteristic polynomial of a linear 
operator on a finite-dimensional vector space V is independent of 
the choice of basis for V. 


Let T be a linear operator on a finite-dimensional vector space V over a 
field F’, let @ be an ordered basis for V, and let A = [T]g. In reference 
to Figure 5.1, prove the following. 


(a) 
(b) 


If v € V and ¢g(v) is an eigenvector of A corresponding to the 
eigenvalue A, then v is an eigenvector of T corresponding to A. 

If A is an eigenvalue of A (and hence of T), then a vector y € F” 
is an eigenvector of A corresponding to if and only if 3 '(Y) is 
an eigenvector of T corresponding to X. 


14.' For any square matrix A, prove that A and A’ have the same charac- 
teristic polynomial (and hence the same eigenvalues). 


15.1 (a) Let T be a linear operator on a vector space V, and let x be an 


16. 


17. 


18. 


eigenvector of T corresponding to the eigenvalue \. For any posi- 
tive integer m, prove that x is an eigenvector of T” corresponding 
to the eigenvalue X”. 

State and prove the analogous result for matrices. 


Prove that similar matrices have the same trace. Hint: Use Exer- 
cise 13 of Section 2.3. 

How would you define the trace of a linear operator on a finite- 
dimensional vector space? Justify that your definition is well- 
defined. 


Let T be the linear operator on Myx (R) defined by T(A) = A®. 


(a) 
(b) 
(c) 


(d) 


Show that +1 are the only eigenvalues of T. 

Describe the eigenvectors corresponding to each eigenvalue of T. 
Find an ordered basis 8 for Mzx2(R) such that [T]g is a diagonal 
matrix. 

Find an ordered basis 6 for Mnxn(2) such that [T]g is a diagonal 
matrix for n > 2. 


Let A, BE Mnxn(C). 


(a) 


Prove that if B is invertible, then there exists a scalar c € C such 
that A+ cB is not invertible. Hint: Examine det(A + cB). 
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(b) Find nonzero 2x 2 matrices A and B such that both A and A+cB 
are invertible for all c € C. 


19.1 Let A and B be similar n x n matrices. Prove that there exists an n- 
dimensional vector space V, a linear operator T on V, and ordered bases 
@ and y for V such that A =[T]g and B =[T],. Hint: Use Exercise 14 
of Section 2.5. 


20. Let A be an n x n matrix with characteristic polynomial 
f(E) = (-1)"t? + ag_at? + + tart + ao. 


Prove that f(0) = ao = det(A). Deduce that A is invertible if and only 
if ao x 0. 


21. Let A and f(t) be as in Exercise 20. 


(a) Prove that f(t) = (Ai —t)(Ag2 —-t)--+ (Ann —t) + ¢(t), where q(t) 
is a polynomial of degree at most n—2. Hint: Apply mathematical 
induction to n. 

(b) Show that tr(A) = (-1)"~ta,_1. 


22.1 (a) Let T be a linear operator on a vector space V over the field F, 
and let g(t) be a polynomial with coefficients from F’. Prove that 
if 2 is an eigenvector of T with corresponding eigenvalue A, then 
g(T)(a) = g(A)a. That is, x is an eigenvector of g(T) with corre- 
sponding eigenvalue g(A). 

(b) State and prove a comparable result for matrices. 


(c) Verify (b) for the matrix A in Exercise 3(a) with polynomial g(t) = 
2t? —t +1, eigenvector x = , and corresponding eigenvalue 


A=4, 


2 
3 


23. Use Exercise 22 to prove that if f(t) is the characteristic polynomial 
of a diagonalizable linear operator T, then f(T) = To, the zero opera- 
tor. (In Section 5.4 we prove that this result does not depend on the 
diagonalizability of T.) 


24. Use Exercise 21(a) to prove Theorem 5.3. 
25. Prove Corollaries 1 and 2 of Theorem 5.3. 


26. Determine the number of distinct characteristic polynomials of matrices 
in M 2x2 (Zo ) : 
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5.2. DIAGONALIZABILITY 


In Section 5.1, we presented the diagonalization problem and observed that 
not all linear operators or matrices are diagonalizable. Although we are able 
to diagonalize operators and matrices and even obtain a necessary and suf- 
ficient condition for diagonalizability (Theorem 5.1 p. 246), we have not yet 
solved the diagonalization problem. What is still needed is a simple test to 
determine whether an operator or a matrix can be diagonalized, as well as a 
method for actually finding a basis of eigenvectors. In this section, we develop 
such a test and method. 

In Example 6 of Section 5.1, we obtained a basis of eigenvectors by choos- 
ing one eigenvector corresponding to each eigenvalue. In general, such a 
procedure does not yield a basis, but the following theorem shows that any 
set constructed in this manner is linearly independent. 


Theorem 5.5. Let T be a linear operator on a vector space V, and let 
1, A2,---,Ax be distinct eigenvalues of T. Ifv,,v2,...,v~ are eigenvectors of 
T such that ; corresponds to v; (1 <i<k), then {v1,v2,..., vz} is linearly 
independent. 


Proof. The proof is by mathematical induction on k. Suppose that k = 1. 
Then v; 4 0 since v; is an eigenvector, and hence {v;} is linearly independent. 
Now assume that the theorem holds for k — 1 distinct eigenvalues, where 
k—1> 1, and that we have k eigenvectors v1, v2,...,U~ corresponding to the 
distinct eigenvalues Ay, A2,...,Ax. We wish to show that {v1,v2,...,u¢} is 
linearly independent. Suppose that a,,a2,...,@, are scalars such that 


QV, + Ggvg ++++ + anv, = 0. (1) 
Applying T — Ax! to both sides of (1), we obtain 


ay(A1 — Ag)U1 + a2(A2 — Ap)v2 +++ + ag—1(Ag—1 — An)UR—-1 = 0. 


By the induction hypothesis {v1,v2,... ,vg—1} is linearly independent, and 
hence 


ay(Ay _ Ax) = a2(A2 = Ar) Ses Ap—1(Ap—1 = Xr) = 0. 
Since Ay, A2,--- , Ax are distinct, it follows that A; -— A, #0 for 1 <i<k-1. 
Soa, =a. =°::: Gp—1 = 0, and (1) therefore reduces to agv, = 0. But 
Up # O and therefore a, = 0. Consequently a, = ag = --- = ay = 0, and it 
follows that {v1,v2,..., vz} is linearly independent. | 


Corollary. Let T be a linear operator on an n-dimensional vector space 
V. IfT has n distinct eigenvalues, then T is diagonalizable. 
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Proof. Suppose that T has n distinct eigenvalues A1,...,A,- For each 7 


choose an eigenvector v; corresponding to 4;. By Theorem 5.5, {v1,...,Un} 
is linearly independent, and since dim(V) = n, this set is a basis for V. Thus, 
by Theorem 5.1 (p. 246), T is diagonalizable. | 
Example 1 

Let 


= ( € Moxo(R). 


The characteristic polynomial of A (and hence of L4) is 


Lt 1 
det(A — t1) = det ( 1 2) = t(t— 2), 


and thus the eigenvalues of Ly are 0 and 2. Since Ly is a linear operator on the 
two-dimensional vector space R?, we conclude from the preceding corollary 
that L4 (and hence A) is diagonalizable. 


The converse of Theorem 5.5 is false. That is, it is not true that if T is 
diagonalizable, then it has n distinct eigenvalues. For example, the identity 
operator is diagonalizable even though it has only one eigenvalue, namely, 
A=1. 

We have seen that diagonalizability requires the existence of eigenvalues. 
Actually, diagonalizability imposes a stronger condition on the characteristic 
polynomial. 


Definition. A polynomial f(t) in P(F) splits over F if there are scalars 
C,@1,-.-,@, (not necessarily distinct) in F such that 


f(t) = c(t — a,)(t — ag) --- (t — ay). 


For example, ¢? — 1 = (t +1)(t— 1) splits over R, but (t? + 1)(t— 2) does not 
split over R because ¢t? +1 cannot be factored into a product of linear factors. 
However, (t? + 1)(t — 2) does split over C because it factors into the product 
(t+%)(t—12)(t—2). If f(€) is the characteristic polynomial of a linear operator 
or a matrix over a field F’, then the statement that f(t) splits is understood 
to mean that it splits over F’. 


Theorem 5.6. The characteristic polynomial of any diagonalizable linear 
operator splits. 


Proof. Let T be a diagonalizable linear operator on the n-dimensional 
vector space V, and let 3 be an ordered basis for V such that [T]g = D isa 
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diagonal matrix. Suppose that 


A «(0 0 

0 Ag 0 
D= . : 

0 0 Xn 


and let f(t) be the characteristic polynomial of T. Then 
Ai —t 0 ve 0 
0 Ag—t ++: 0 
f(t) = det(D — tI) = det , . . 
0 0 s+ An —t 


= (Ar —t)Q2— 8) ++ (An = t) = (-1)" (t= Au)(t= Aa) (E- An). 


From this theorem, it is clear that if T is a diagonalizable linear operator 
on an n-dimensional vector space that fails to have distinct eigenvalues, then 
the characteristic polynomial of T must have repeated zeros. 

The converse of Theorem 5.6 is false; that is, the characteristic polynomial 
of T may split, but T need not be diagonalizable. (See Example 3, which 
follows.) The following concept helps us determine when an operator whose 
characteristic polynomial splits is diagonalizable. 


Definition. Let be an eigenvalue of a linear operator or matrix with 
characteristic polynomial f(t). The (algebraic) multiplicity of » is the 
largest positive integer k for which (t — d)* is a factor of f(t). 


Example 2 
Let 


a a 
A={0 3 4], 
00 4 


which has characteristic polynomial f(t) = —(t — 3)?(t— 4). Hence \ = 3 is 
an eigenvalue of A with multiplicity 2, and \ = 4 is an eigenvalue of A with 
multiplicity 1. 


If T is a diagonalizable linear operator on a finite-dimensional vector space 
V, then there is an ordered basis 3 for V consisting of eigenvectors of T. We 
know from Theorem 5.1 (p. 246) that [T]g is a diagonal matrix in which the 
diagonal entries are the eigenvalues of T. Since the characteristic polynomial 
of T is det([T]g — tZ), it is easily seen that each eigenvalue of T must occur 
as a diagonal entry of [T]g exactly as many times as its multiplicity. Hence 
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@ contains as many (linearly independent) eigenvectors corresponding to an 
eigenvalue as the multiplicity of that eigenvalue. So the number of linearly 
independent eigenvectors corresponding to a given eigenvalue is of interest in 
determining whether an operator can be diagonalized. Recalling from Theo- 
rem 5.4 (p. 250) that the eigenvectors of T corresponding to the eigenvalue 
X are the nonzero vectors in the null space of T — Al, we are led naturally to 
the study of this set. 


Definition. Let T be a linear operator on a vector space V, and let 
\ be an eigenvalue of T. Define E, = {x € V: T(x) = Ax} = N(T — Aly). 
The set E, is called the eigenspace of T corresponding to the eigenvalue 
. Analogously, we define the eigenspace of a square matrix A to be the 
eigenspace of L 4. 


Clearly, E, is a subspace of V consisting of the zero vector and the eigen- 
vectors of T corresponding to the eigenvalue 4. The maximum number of 
linearly independent eigenvectors of T corresponding to the eigenvalue X is 
therefore the dimension of E,. Our next result relates this dimension to the 
multiplicity of X. 


Theorem 5.7. Let T be a linear operator on a finite-dimensional vec- 
tor space V, and let X be an eigenvalue of T having multiplicity m. Then 
1 < dim(E)) < m. 


Proof. Choose an ordered basis {v1, v2,...,Up} for Ey, extend it to an or- 


dered basis 3 = {v1,V2,...,Up,Up41,---;Un} for V, and let A = [T]g. Observe 
that v; (1 <i <p) is an eigenvector of T corresponding to X, and therefore 


_ (Ap B 
A= ("3 A 


By Exercise 21 of Section 4.3, the characteristic polynomial of T is 


f(t) = det(A — tn) = det 6 a C 2) 


= det((A — t)Ip) det(C — tIn_p) 


= (A— t)Pg(t), 
where g(t) is a polynomial. Thus (A — t)? is a factor of f(t), and hence the 
multiplicity of is at least p. But dim(E)) = p, and so dim(E,) < m. | 


Example 3 


Let T be the linear operator on P2(R) defined by T(f(x)) = f’(a). The 
matrix representation of T with respect to the standard ordered basis (@ for 
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P2(R) is 
0 1 0 
[T]a= {0 0 2 
0 0 0 


Consequently, the characteristic polynomial of T is 


-t 1 0 
det([T]g -tl)=det} 0 -t 2] =-+#?. 
0 0 -t 


Thus T has only one eigenvalue (A = 0) with multiplicity 3. Solving T(f(x)) = 
f'(a) = 0 shows that E, = N(T — Al) = N(T) is the subspace of P2(R) con- 
sisting of the constant polynomials. So {1} is a basis for E,, and therefore 
dim(E,) = 1. Consequently, there is no basis for P2(R) consisting of eigen- 
vectors of T, and therefore T is not diagonalizable. 


Example 4 
Let T be the linear operator on R® defined by 


ay 4a, +r a3 
T a2 = 2a4 + 3a2 “Te 2a3 
a3 ay ga daz 


We determine the eigenspace of T corresponding to each eigenvalue. Let ( 
be the standard ordered basis for R?. Then 


> 


4 0 1 
[T]a=|{2 3 2 
1 0 4 
and hence the characteristic polynomial of T is 
4-t 0 1 
det([T]g —t2) =det| 2 3-t 2 | =-(t—5)(t-3)?. 
1 O 4-t 


So the eigenvalues of T are A, = 5 and Ag = 3 with multiplicities 1 and 2, 
respectively. 


Since 
ry —1 0 1 Ly 0 
E,, =N(T—Al) =< [ we | €R?: 2-2 2) [a2)/= [0]>, 
£3 ie een a 0 
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E, is the solution space of the system of linear equations 
—21 + 23=0 
224 — 222 Tr. 223 =0 
Ly -— %@3= 0. 


It is easily seen (using the techniques of Chapter 3) that 
1 
1 


is a basis for E,,. Hence dim(E),) = 1. 


Similarly, E,, = N(T — Agl) is the solution space of the system 


Lr %3= 0 
221 ay 223 =0 
Lr = 0. 


Since the unknown x2 does not appear in this system, we assign it a para- 
metric value, say, 2 = s, and solve the system for x; and 23, introducing 
another parameter t. The result is the general solution to the system 


Ly 0 —1 
t)=s{1)]+t 0], for s,tE R. 
x3 0 1 
It follows that 
0 —1 
To 4 0 
0 1 


is a basis for Ey,, and dim(E),) = 2. 


In this case, the multiplicity of each eigenvalue \; is equal to the dimension 
of the corresponding eigenspace E,,. Observe that the union of the two bases 
just derived, namely, 


1\ /0\ (/-1 
2),{(1].[ SO} ?, 
1) \o 1 


is linearly independent and hence is a basis for R® consisting of eigenvectors 
of T. Consequently, T is diagonalizable. 
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Examples 3 and 4 suggest that an operator whose characteristic polyno- 
mial splits is diagonalizable if and only if the dimension of each eigenspace 
is equal to the multiplicity of the corresponding eigenvalue. This is indeed 
true, as we now show. We begin with the following lemma, which is a slight 
variation of Theorem 5.5. 


Lemma. Let T be a linear operator, and let 1, A2,...,Ax be distinct 
eigenvalues of T. For each i = 1,2,...,k, let vu; € Ey,, the eigenspace corre- 
sponding to r,. If 


Vy tvuet---+u, =O, 
then v; = 0 for all i. 


Proof. Suppose otherwise. By renumbering if necessary, suppose that, for 
1<m<k, we have vy; 4 0 for 1 <i<m, and v; = 0 for i > m. Then, for 
each 1 < m, v; is an eigenvector of T corresponding to A; and 


Uy vgt++++Um = 0. 


But this contradicts Theorem 5.5, which states that these v;’s are linearly 
independent. We conclude, therefore, that v; = 0 for all 7. | 


Theorem 5.8. Let T be a linear operator on a vector space V, and let 
1, A2,---,Ax be distinct eigenvalues of T. For each i = 1,2,...,k, let S; 
be a finite linearly independent subset of the eigenspace E),. Then S = 
S, US. U---US, is a linearly independent subset of V. 


Proof. Suppose that for each 7 
o= {vi, Vi2y +++ Vin, }- 


Then S = {uj:1<j <n, and 1<i<k}. Consider any scalars {a;,;} such 
that 


k Ni 
; ; AyjViG = 0. 
i=1 j=1 


For each i, let 


ni 
Wi= ; Ajj Vij. 
j=l 


Then w; € E), for each 7, and w; +---+w, = 0. Therefore, by the lemma, 
w; = O for all 7. But each S$; is linearly independent, and hence a;; = 0 for 
all 7. We conclude that S is linearly independent. | 
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Theorem 5.8 tells us how to construct a linearly independent subset of 
eigenvectors, namely, by collecting bases for the individual eigenspaces. The 
next theorem tells us when the resulting set is a basis for the entire space. 


Theorem 5.9. Let T be a linear operator on a finite-dimensional vector 
space V such that the characteristic polynomial of T splits. Let A1,A2,...,Ak 
be the distinct eigenvalues of T. Then 

(a) T is diagonalizable if and only if the multiplicity of A; is equal to 
dim(E),) for all i. 

(b) If T is diagonalizable and (3; is an ordered basis for E,, for each i, then 
B = 8B, UP2U---U, is an ordered basis? for V consisting of eigenvectors 
of T. 


Proof. For each i, let m; denote the multiplicity of A;, d; = dim(E,,), and 
n = dim(V). 

First, suppose that T is diagonalizable. Let @ be a basis for V consisting 
of eigenvectors of T. For each 2, let 6; = GM E),, the set of vectors in @ that 
are eigenvectors corresponding to \;, and let n; denote the number of vectors 
in @;. Then n; < d; for each i because 3; is a linearly independent subset of 
a subspace of dimension d;, and d; < m,; by Theorem 5.7. The n,’s sum to n 
because 3 contains n vectors. The m,’s also sum to n because the degree of 
the characteristic polynomial of T is equal to the sum of the multiplicities of 
the eigenvalues. Thus 


k k k 
n= dims dds dim = 
i=1 i=1 i=l 
It follows that 
k 
i=1 


Since (m; — d;) > 0 for all i, we conclude that m; = d; for all #. 

Conversely, suppose that m; = d; for all 7. We simultaneously show that 
T is diagonalizable and prove (b). For each i, let 3; be an ordered basis for 
Ey,, and let 6 = 6, UG2U---US,. By Theorem 5.8, is linearly independent. 
Furthermore, since d; = m, for all i, 6 contains 


We regard (3; U 32 U---U Gx as an ordered basis in the natural way—the vectors 
in (3 are listed first (in the same order as in (1), then the vectors in (2 (in the same 
order as in (32), etc. 
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vectors. Therefore @ is an ordered basis for V consisting of eigenvectors of V, 
and we conclude that T is diagonalizable. | 


This theorem completes our study of the diagonalization problem. We 
summarize our results. 


Test for Diagonalization 


Let T be a linear operator on an n-dimensional vector space V. Then T 
is diagonalizable if and only if both of the following conditions hold. 


1. The characteristic polynomial of T splits. 
2. For each eigenvalue A of T, the multiplicity of \ equals n —rank(T — Al). 


These same conditions can be used to test if a square matrix A is diagonal- 
izable because diagonalizability of A is equivalent to diagonalizability of the 
operator Ly. 

If T is a diagonalizable operator and (1, G2,...,(@, are ordered bases for 
the eigenspaces of T, then the union @ = (3, UG2U---U (x is an ordered basis 
for V consisting of eigenvectors of T, and hence [T], is a diagonal matrix. 


When testing T for diagonalizability, it is usually easiest to choose a conve- 
nient basis a for V and work with B = [T]q. If the characteristic polynomial 
of B splits, then use condition 2 above to check if the multiplicity of each 
repeated eigenvalue of B equals n — rank(B — XI). (By Theorem 5.7, condi- 
tion 2 is automatically satisfied for eigenvalues of multiplicity 1.) If so, then 
B, and hence T, is diagonalizable. 

If T is diagonalizable and a basis @ for V consisting of eigenvectors of T 
is desired, then we first find a basis for each eigenspace of B. The union of 
these bases is a basis y for F” consisting of eigenvectors of B. Each vector 
in y is the coordinate vector relative to a of an eigenvector of T. The set 
consisting of these n eigenvectors of T is the desired basis /. 

Furthermore, if A is an n x n diagonalizable matrix, we can use the corol- 
lary to Theorem 2.23 (p. 115) to find an invertible n x n matrix Q and a 
diagonal n x n matrix D such that Q~'AQ = D. The matrix Q has as its 
columns the vectors in a basis of eigenvectors of A, and D has as its jth 
diagonal entry the eigenvalue of A corresponding to the jth column of Q. 

We now consider some examples illustrating the preceding ideas. 


Example 5 
We test the matrix 


CoOwWrF 
oe) 
MN 
= 
oo 
x 
or) 
os 
es) 
SS 


for diagonalizability. 
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The characteristic polynomial of A is det(A—tl) = —(t—4)(t—3)?, which 
splits, and so condition 1 of the test for diagonalization is satisfied. Also A 
has eigenvalues A; = 4 and A2 = 3 with multiplicities 1 and 2, respectively. 
Since A; has multiplicity 1, condition 2 is satisfied for A,;. Thus we need only 
test condition 2 for Ay. Because 


0 1 0 
A-—AgqI=|0 0 0 
0 0 1 


has rank 2, we see that 3 — rank(A — A2/) = 1, which is not the multiplicity 
of A». Thus condition 2 fails for Az, and A is therefore not diagonalizable. 


4 


Example 6 
Let T be the linear operator on P2(R) defined by 


We first test T for diagonalizability. Let a denote the standard ordered basis 
for Po(R) and B = [T]q. Then 


1 1 il 
B= {0 1 0 
0 1 2 


The characteristic polynomial of B, and hence of T, is —(t—1)?(t— 2), which 
splits. Hence condition 1 of the test for diagonalization is satisfied. Also B 
has the eigenvalues A; = 1 and Ag = 2 with multiplicities 2 and 1, respectively. 
Condition 2 is satisfied for Az because it has multiplicity 1. So we need only 
verify condition 2 for A; = 1. For this case, 


0 1 1 
3—rank(B— Ayl)=3-rank{0 0 0] =3-1=2, 
0 11 


which is equal to the multiplicity of Ai. Therefore T is diagonalizable. 


We now find an ordered basis y for R? of eigenvectors of B. We consider 
each eigenvalue separately. 


The eigenspace corresponding to A, = 1 is 


Uy 0 1 1 Ly 
Ey, = T2)€ R3: 0 0 O v2) =—0 5 
X3 0 1 1 x3 
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which is the solution space for the system 


t+ 23 =0, 


and has 
1 0 
N= 0 5 —1 
0 1 
as a basis. 


The eigenspace corresponding to Az = 2 is 


41 -1 1 1 Ly 
Ey = go | €R?: 0 -1 O 2) =0>, 
x3 0 1 0 X3 
which is the solution space for the system 
2X1 + Hop) + v3 = 0 
XQ = 0, 
and has 
1 
oo 0 
1 
as a, basis. 
Let 
1 0 1 
Y=%71 U y2= 0 : —1 ; 0 
0 1 


Then ¥ is an ordered basis for R? consisting of eigenvectors of B. 


Finally, observe that the vectors in y are the coordinate vectors relative 
to a of the vectors in the set 


b= {1,-a + 27,1427}, 


which is an ordered basis for P2(R) consisting of eigenvectors of T. Thus 


100 
[Ta={o0 1 0]. @ 
002 
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Our next example is an application of diagonalization that is of interest 
in Section 5.3. 


Example 7 
Let 


0 -2 
AS @ ) | 
We show that A is diagonalizable and find a 2 x 2 matrix Q such that Q~!AQ 
is a diagonal matrix. We then show how to use this result to compute A” for 
any positive integer n. 
First observe that the characteristic polynomial of A is (t— 1)(t— 2), and 


hence A has two distinct eigenvalues, 43 = 1 and Ag = 2. By applying the 
corollary to Theorem 5.5 to the operator L4, we see that A is diagonalizable. 


ee naf() ant w= f(-3] 


are bases for the eigenspaces Ey, and E),, respectively. Therefore 


venun={(),()} 


is an ordered basis for R? consisting of eigenvectors of R?. Let 


—2 -1l 
the matrix whose columns are the vectors in y. Then, by the corollary to 
Theorem 2.23 (p. 115), 


D=Q"4Q=l[Lala=(9 9): 


To find A” for any positive integer n, observe that A = QDQ7'. Therefore 


A” = (QDQ"")" 
= (QDQ~')(QDQ"')---(QDQ™") 
= QD"Q"+ 


29 AN ON fe Sia BAO | ae 
NM. A. AOL oR is Say. erage Saag a pnetati)s 
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We now consider an application that uses diagonalization to solve a system 
of differential equations. 


Systems of Differential Equations 


Consider the system of differential equations 


v= 321 xr r3 


vy = 2x1 + 4x2 + 2x3 


3 =—-%1— La7r £3, 


where, for each i, x; = 2;(t) is a differentiable real-valued function of the 
real variable t. Clearly, this system has a solution, namely, the solution in 
which each x;(t) is the zero function. We determine all of the solutions to 
this system. 

Let x: R — R? be the function defined by 


r3(t) 
Let 
3 1 1 
A= 2 4 2 
-1 -1 1 


be the coefficient matrix of the given system, so that we can rewrite the 
system as the matrix equation 2’ = Az. 
It can be verified that for 


—1 6. +i 2 0 0 
Q={ 0 -1 -2] and D={0 2 O}, 
1 1 1 00 4 


we have Q-!AQ = D. Substitute A = QDQ7! into x’ = Ax to obtain 
x’ = QDQ™'=z or, equivalently, Q~'x’ = DQ zx. The function y: R — R° 
defined by y(t) = Q~‘z(t) can be shown to be differentiable, and y’ = Q~'z’ 
(see Exercise 16). Hence the original system can be written as y’ = Dy. 
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Since D is a diagonal matrix, the system y’ = Dy is easy to solve. Setting 


we can rewrite y’ = Dy as 


/ 
1 
Yd 
3(t) 


The three equations 


are independent of each other, and thus can be solved individually. It is 
easily seen (as in Example 3 of Section 5.1) that the general solution to these 
equations is y;(t) = cye”", yo(t) = cee", and y3(t) = c3e*, where c1,¢2, and 
c3 are arbitrary constants. Finally, 


a(t) =f. (O° Spy fae’ 

r2(t) x(t) = Qy(t) ie oe cone 

13(t) 1 1 1 c3ett 
—c,e2# ~ csett 


ce! + cpe2! + c3e*t 


yields the general solution of the original system. Note that this solution can 
be written as 


—l 0 —l 
x(t) = e7 | cy OO} +ce.]-1 fag lee. i ao 
1 1 1 


The expressions in brackets are arbitrary vectors in E,, and E),, respectively, 
where \, = 2 and Az = 4. Thus the general solution of the original system is 
a(t) = e7*z, + e*z, where z1 € Ey, and z2 € E),. This result is generalized 
in Exercise 15. 


Direct Sums* 


Let T be a linear operator on a finite-dimensional vector space V. There 
is a way of decomposing V into simpler subspaces that offers insight into the 
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behavior of T. This approach is especially useful in Chapter 7, where we study 
nondiagonalizable linear operators. In the case of diagonalizable operators, 
the simpler subspaces are the eigenspaces of the operator. 


Definition. Let W,,W2,...,W, be subspaces of a vector space V. We 
define the sum of these subspaces to be the set 


{uy Fug t-++ + up: u, © W; for 1 <i< kt}, 
k 
which we denote by W, +W2+-+:+We or 5° Wj. 
t=1 


It is a simple exercise to show that the sum of subspaces of a vector space 
is also a subspace. 


Example 8 


Let V = R°, let W, denote the zy-plane, and let W2 denote the yz-plane. 
Then R? = W, + W2 because, for any vector (a,b,c) € R?, we have 


(a, b,c) = (a,0,0) + (0, b,c), 
where (a,0,0) € W; and (0,b,c) €@Wo. 


Notice that in Example 8 the representation of (a, b,c) as a sum of vectors 
in W, and W» is not unique. For example, (a,b,c) = (a,b,0) + (0,0,c) is 
another representation. Because we are often interested in sums for which 
representations are unique, we introduce a condition that assures this out- 
come. The definition of direct sum that follows is a generalization of the 
definition given in the exercises of Section 1.3. 


Definition. Let W,,W2,...,W, be subspaces of a vector space V. We 
call V the direct sum of the subspaces W,,W2,...,W, and write V = 
W, @W2 @---OWs, if 


k 
v= Sow; 
i=l 
and 
W; n> Wi ={0} foreachj(1<j<k). 
iZj 
Example 9 


Let V = R*, Wi = {(a,b,0,0): a,b,€ R}, Wo = {(0,0,c,0): ¢ € R}, and 
Ws; = {(0,0,0,d): d€ R}. For any (a,b,c,d) €V, 


(a, b,c, d) = (a,b, 0,0) + (0,0, ¢,0) + (0,0,0,d) € Wi + Wo + Ws. 
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Thus 


To show that V is the direct sum of W,, W2, and Ws, we must prove that 
W N (We + Ws) = Wo a (Wi + Ws) = W3 a) (Wy + W2) _ {0}. But these 
equalities are obvious, and soV=W,@W2OW3. 


Our next result contains several conditions that are equivalent to the 
definition of a direct sum. 


Theorem 5.10. Let W1, W2,...,W, be subspaces of a finite-dimensional 
vector space V. The following conditions are equivalent. 
k 


(b) V = SOW: and, for any vectors v1,V2,...,U% such that v; © W; 
i=l 

(l<i<k), ifvy tvot--+-+ vp = 0, then v; = 0 for alli. 

(c) Each vector v € V can be uniquely written as v = v1 + vo +--+ + Uk, 
where v; € Wj. 

(d) If; is an ordered basis for W; (1 <i<k), then y,Uy2U---U%p is an 
ordered basis for V. 

(e) For each i = 1,2,...,k, there exists an ordered basis y; for W; such 
that y, U y2U--:UY is an ordered basis for V. 


Proof. Assume (a). We prove (b). Clearly 


k 
v= SoWi. 
i=1 
Now suppose that v1, v2,...,Uzx are vectors such that v; € W, for all 7 and 


Vy + vg +++: + vp = O. Then for any 7 
Uj = Sou; E So Wi. 
ij ij 
But —v,; € W; and hence 
TUZE W; NM SOW; = {0}. 
ifj 


So v; = 0, proving (b). 
Now assume (b). We prove (c). Let v € V. By (b), there exist vectors 
U1, V2,...,U~K Such that v; © W; and v = vy + vo +--- + vx. We must show 
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that this representation is unique. Suppose also that v = w, + wo+---+wkz, 
where w; € W; for all 7. Then 


(v1, — wi) + (vg — we) +++ + (UE — WE) = O. 
But v; — w; € W; for all i, and therefore v; — w; = 0 for all i by (b). Thus 
uv; = w; for all 7, proving the uniqueness of the representation. 


Now assume (c). We prove (d). For each i, let y; be an ordered basis for 
W,. Since 


by (c), it follows that y, U y2U---U ye generates V. To show that this 
set is linearly independent, consider vectors vi; € yi (f = 1,2,...,mi and 
i=1,2,...,k) and scalars a,; such that 


S- AijVig = 0. 
tj 
For each 7, set 
mi 
Wie= S> Ajj Vij - 
j=l 
Then for each i, w; € span(7y;) = W; and 


W1i+ Wats: + WE = ; ayjviz = O. 
a,j 


Since 0 € W; for each i and 04+ 04+---+0 = wi+wo+---+wr, (c) implies 
that w; = 0 for all 7. Thus 


mi 
0= Wi = ; O45 Vij 
j=l 


for each 7. But each 7; is linearly independent, and hence a,;; = 0 for all 7 
and j. Consequently y, U yo U-:-U 7% is linearly independent and therefore 
is a basis for V. 

Clearly (e) follows immediately from (d). 

Finally, we assume (e) and prove (a). For each 2, let y; be an ordered 
basis for W; such that y; Uy2U---U yz is an ordered basis for V. Then 


V = span(y1 U 72 U---U 4) 
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k 
= span(71) + span(72) + +++ span(y.) = 32 Wi 
i=1 
by repeated applications of Exercise 14 of Section 1.4. Fix j (1 < 7 < k), and 
suppose that, for some nonzero vector v € V, 


VE W; NM S- W;. 
tAj 
Then 


v €W, =span(y;) and ve Sow: = span U Yi 
ij iAj 


Hence v is a nontrivial linear combination of both y; and U 7 |, so that 
ifj 

v can be expressed as a linear combination of 7; UygU---U yx in more than 

one way. But these representations contradict Theorem 1.8 (p. 43), and so 

we conclude that 


Win S"W; = {6}, 


tAj 
proving (a). | 


With the aid of Theorem 5.10, we are able to characterize diagonalizability 
in terms of direct sums. 


Theorem 5.11. A linear operator T on a finite-dimensional vector space 
V is diagonalizable if and only if V is the direct sum of the eigenspaces of T. 


Proof. Let A1,A2,..., Ax be the distinct eigenvalues of T. 

First suppose that T is diagonalizable, and for each 7 choose an ordered 
basis 7; for the eigenspace E,,. By Theorem 5.9, 71 Uy2gU-:-U x is a basis 
for V, and hence V is a direct sum of the E),’s by Theorem 5.10. 

Conversely, suppose that V is a direct sum of the eigenspaces of T. For 
each 7, choose an ordered basis 7; of E,,. By Theorem 5.10, the union 
1 UyqU--+ Ux is a basis for V. Since this basis consists of eigenvectors of 
T, we conclude that T is diagonalizable. | 


Example 10 
Let T be the linear operator on R* defined by 


T(a, b, c,d) = (a, b, 2c, 3d). 
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It is easily seen that T is diagonalizable with eigenvalues 4; = 1, Ao = 2, 
and A3 = 3. Furthermore, the corresponding eigenspaces coincide with the 
subspaces W;, W2, and W3 of Example 9. Thus Theorem 5.11 provides us 
with another proof that R¢=W, ®@W2OW3. 


EXERCISES 


1. Label the following statements as true or false. 


(a) Any linear operator on an n-dimensional vector space that has 
fewer than n distinct eigenvalues is not diagonalizable. 

(b) Two distinct eigenvectors corresponding to the same eigenvalue 
are always linearly dependent. 

(c) If \ is an eigenvalue of a linear operator T, then each vector in E 
is an eigenvector of T. 

(d) If A; and )2 are distinct eigenvalues of a linear operator T, then 
Ey, NE, = {0}. 

(e) Let A € Mnxn(F) and @ = {v1, v2,...,Un} be an ordered basis for 
F” consisting of eigenvectors of A. If Q is the n x n matrix whose 
jth column is v; (1 < j <n), then Q~'AQ is a diagonal matrix. 

(f) A linear operator T on a finite-dimensional vector space is diago- 
nalizable if and only if the multiplicity of each eigenvalue equals 
the dimension of E). 

(g) Every diagonalizable linear operator on a nonzero vector space has 
at least one eigenvalue. 


The following two items relate to the optional subsection on direct sums. 


(h) If a vector space is the direct sum of subspaces W1, W2,...,We, 
then W; OW; = {0} for 2 i. 


(i) If 
k 
V=S>W; and W;NW; = {0} fori #5, 
t=1 
then V=W, @W2 @-::OWsz. 


2. For each of the following matrices A € Mnx,(R), test A for diagonal- 
izability, and if A is diagonalizable, find an invertible matrix Q and a 
diagonal matrix D such that Q~14Q = D. 


of) 6) 663 


7 -4 0 00 1 i. 6 
(d) {8 —5 0 (e) {1 0 -1 (f) {0 1 2 
6 -6 3 01 1 6:0: 3 
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3 1 1 
(g)( 2 42 
1 -1 1 


For each of the following linear operators T on a vector space V, test 

T for diagonalizability, and if T is diagonalizable, find a basis 3 for V 

such that [T]g is a diagonal matrix. 

(a) V = P3(R) and T is defined by T(f(x)) = f(x) + f” (a), respec- 
tively. 

(b) V = P2(R) and T is defined by T(ax? + ba + c) = cx? + ba + a. 

(c) V=R? and T is defined by 


ay a2 
T a2 = —ay 
a3 2a3 


(d) V =P2(R) and T is defined by T(f(x)) = f(0) + f(1)(a@ + 2”). 
(e) V=C? and T is defined by T(z, w) = (z + tw, iz +w). 
(f£) V = Mox2(R) and T is defined by T(A) = At. 


Prove the matrix version of the corollary to Theorem 5.5: If A € 
Mnxn(F’) has n distinct eigenvalues, then A is diagonalizable. 


State and prove the matrix version of Theorem 5.6. 


(a) Justify the test for diagonalizability and the method for diagonal- 
ization stated in this section. 
(b) Formulate the results in (a) for matrices. 


For 
1 4 
A= ( 7 € Mox2(R), 


find an expression for A”, where n is an arbitrary positive integer. 


Suppose that A € Myx»(F) has two distinct eigenvalues, 41 and 2, 
and that dim(E),) = —1. Prove that A is diagonalizable. 


Let T be a linear operator on a finite-dimensional vector space V, and 
suppose there exists an ordered basis 6 for V such that [T]g is an upper 
triangular matrix. 


(a) Prove that the characteristic polynomial for T splits. 
(b) State and prove an analogous result for matrices. 


The converse of (a) is treated in Exercise 32 of Section 5.4. 
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. 5.2 Diagonalizability 281 


Let T be a linear operator on a finite-dimensional vector space V with 
the distinct eigenvalues 1, A2,..., Ax and corresponding multiplicities 
m1,M2,...,mMx. Suppose that ( is a basis for V such that [T]g is an 
upper triangular matrix. Prove that the diagonal entries of [T]g are 
Ai, A2,---;Ax and that each A; occurs m; times (1 <7 < k). 


Let A be an n x n matrix that is similar to an upper triangular ma- 
trix and has the distinct eigenvalues Aj, A2,.-.-,Ax with corresponding 
multiplicities m1, m2,...,mx. Prove the following statements. 


k 
(a) tr(A) =D mir; 
(b) det(A) = (A1)™!(A2)™ +++ (Ag) ™*. 


Let T be an invertible linear operator on a finite-dimensional vector 
space V. 


(a) Recall that for any eigenvalue \ of T, \~! is an eigenvalue of T~! 
(Exercise 8 of Section 5.1). Prove that the eigenspace of T corre- 
sponding to 2 is the same as the eigenspace of T~! corresponding 
ton 

(b) Prove that if T is diagonalizable, then T~! is diagonalizable. 


Let A € Maxn(F). Recall from Exercise 14 of Section 5.1 that A and 

A’ have the same characteristic polynomial and hence share the same 

eigenvalues with the same multiplicities. For any eigenvalue » of A and 

A’, let E, and E\ denote the corresponding eigenspaces for A and A’, 

respectively. 

(a) Show by way of example that for a given common eigenvalue, these 
two eigenspaces need not be the same. 

(b) Prove that for any eigenvalue A, dim(E,) = dim(E4). 

(c) Prove that if A is diagonalizable, then A’ is also diagonalizable. 


Find the general solution to each system of differential equations. 


ro ee = 
(a) 2 = e«+y (b) sg = 82,4 10x 
y =3xn-y Uy =—5x1— Tx 
/ 
Ly =] +r & 
(c) t= tot 23 
x3 = 2x3 
Let 
441 G12 °°) Ain 
| Q21 G22 ""* Gan 


Qn1 An2 sts Ann 
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be the coefficient matrix of the system of differential equations 


T= ait a12%2 “oi Aintn 
/ 

Lo = aQ121 a22%2 ie a2ntn 
/ 

Ly = anit An2t2 ac Anntn- 


Suppose that A is diagonalizable and that the distinct eigenvalues of A 
are Aj, A2,...,Ax- Prove that a differentiable function x: R — R” isa 
solution to the system if and only if x is of the form 


x(t) = egy + ety bee ee aL, 


where z; € Ey, for? =1,2,...,k. Use this result to prove that the set 
of solutions to the system is an n-dimensional real vector space. 


16. Let C € Mnrxn(R), and let Y be an n x p matrix of differentiable 
functions. Prove (CY)’ = CY’, where (Y");; = Yj; for all i,j. 


Exercises 17 through 19 are concerned with simultaneous diagonalization. 


Definitions. Two linear operators T and U on a finite-dimensional vector 
space V are called simultaneously diagonalizable if there exists an ordered 
basis 3 for V such that both [T]g and [U]g are diagonal matrices. Similarly, 
A, BE Mnxn(F) are called simultaneously diagonalizable if there exists 
an invertible matrix Q € Mnxn(F) such that both Q~'AQ and Q-'BQ are 
diagonal matrices. 


17. (a) Prove that if T and U are simultaneously diagonalizable linear 
operators on a finite-dimensional vector space V, then the matrices 
[T]g and [U]g are simultaneously diagonalizable for any ordered 

basis (. 
(b) Prove that if A and B are simultaneously diagonalizable matrices, 
then Ly and Lg are simultaneously diagonalizable linear operators. 


18. (a) Prove that if T and U are simultaneously diagonalizable operators, 
then T and U commute (i.e., TU = UT). 
(b) Show that if A and B are simultaneously diagonalizable matrices, 
then A and B commute. 


The converses of (a) and (b) are established in Exercise 25 of Section 5.4. 


19. Let T be a diagonalizable linear operator on a finite-dimensional vector 
space, and let m be any positive integer. Prove that T and T™ are 
simultaneously diagonalizable. 


Exercises 20 through 23 are concerned with direct sums. 
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20. Let W,,Wo,...,W, be subspaces of a finite-dimensional vector space V 
such that 
k 
YW =v 
i=1 


Prove that V is the direct sum of W,,W2,...,W, if and only if 
k 
dim(V) = 5° dim(W;). 
i=1 


21. Let V be a finite-dimensional vector space with a basis 3, and let 
31, G2,.--, Bx be a partition of @ (i-e., G1, G2,..., 8, are subsets of 3 
such that 6 = 6; U6.U---UG, and 6; 6; = @ ifi Fj). Prove that 
V = span((i1) © span(J2) @ --- @ span(x). 


22. Let T be a linear operator on a finite-dimensional vector space V, and 
suppose that the distinct eigenvalues of T are A1, A2,..., Ax. Prove that 


span({x € V: x is an eigenvector of T}) = E,, @E,, ®-:-P®Ey,. 


23. Let Wy ,W2,K1,Ko,...,Kp,M1,Mo,...,M, be subspaces of a vector 
space V such that W, = Ki ®K2@---GK, and Wz = M,; @Mo@---GMg. 
Prove that if Wi 9 W2 = {0}, then 


W, + We = Wi @ Wo = Ki @ Ko ©: P@Ky @ M1 © Ma @--- @ My. 


5.3* MATRIX LIMITS AND MARKOV CHAINS 


In this section, we apply what we have learned thus far in Chapter 5 to study 
the limit of a sequence of powers A, A?,...,A”,..., where A is a square 
matrix with complex entries. Such sequences and their limits have practical 
applications in the natural and social sciences. 

We assume familiarity with limits of sequences of real numbers. The 
limit of a sequence of complex numbers {z,,: m = 1,2,...} can be defined 
in terms of the limits of the sequences of the real and imaginary parts: If 
Zm = Tm +18m, where ry and s,, are real numbers, and 7 is the imaginary 
number such that 7? = —1, then 


lim zm = lim ryz+i lim snp, 
moo m— co m—- co 


provided that lim r,, and lim s,, exist. 


m—- co m—- co 
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Definition. Let L,A,,A,,... ben x p matrices having complex entries. 
The sequence Aj, Ao,... is said to converge to the n x p matrix L, called 
the limit of the sequence, if 

m— oo 
for all 1 <i<nand1<j< p. To designate that L is the limit of the 
sequence, we write 


lm A,, = L. 


m— co 


Example 1 
If 

1 1 ( 2) 3m? (2) 

m 4 m2+1 ! m—1 
Am, = ’ 
a 1: 

(GQ) 2 (ata) 

then 


lim fn = {6 0 ge 


m—- coo 0 2 e€ 
where e is the base of the natural logarithm. 


A simple, but important, property of matrix limits is contained in the next 
theorem. Note the analogy with the familiar property of limits of sequences 
of real numbers that asserts that if lim a, exists, then 


m—- co 


lim cam, = e( lim am) : 


m— co m—- co 


Theorem 5.12. Let A,,A2,... be a sequence of n x p matrices with 
complex entries that converges to the matrix L. Then for any P € Myxn(C) 
and Q € Mpxs(C), 


lim PA, =PL and lim A,Q=LQ. 


m—- co 


Proof. For any i (1 <i<r)andj (1<j <p), 


lim (PAm)ij = lim S > Pir(Am) kj 


m—oo 
k=1 
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= So Pics tim (Am)ag = Yo PieLes = (PL)iy- 
k=1 k=1 


Hence lim PA,, = PL. The proof that lim A,,Q = LQ is similar. | 


Corollary. Let A € Mnxn(C) be such that lim A™ = L. Then for any 
invertible matrix Q € Mnxn(C), 


lim (Q4Q71)" = QLQ™. 
Proof. Since 


(QAQ™*)™ = (QAQ™")(QAQ™*) -- - (QAQ™") = QA™Q™, 


we have 


lim (QAQ™1)" = lim QA"Q™! = Q (lim A™) Q-1 = QLQ" 


by applying Theorem 5.12 twice. | 
In the discussion that follows, we frequently encounter the set 
S={XAEC: |A| < lor A= 1}. 


Geometrically, this set consists of the complex number 1 and the interior of 
the unit disk (the disk of radius 1 centered at the origin). This set is of 
interest because if \ is a complex number, then lim X” exists if and only 


A € S. This fact, which is obviously true if is real: can be shown to be true 
for complex numbers also. 

The following important result gives necessary and sufficient conditions 
for the existence of the type of limit under consideration. 


Theorem 5.13. Let A be a square matrix with complex entries. Then 
lim A” exists if and only if both of the following conditions hold. 


m—- co 


(a) Every eigenvalue of A is contained in S. 
(b) If 1 is an eigenvalue of A, then the dimension of the eigenspace corre- 
sponding to 1 equals the multiplicity of 1 as an eigenvalue of A. 


One proof of this theorem, which relies on the theory of Jordan canonical 
forms (Section 7.2), can be found in Exercise 19 of Section 7.2. A second 
proof, which makes use of Schur’s theorem (Theorem 6.14 of Section 6.4), 
can be found in the article by S. H. Friedberg and A. J. Insel, “Convergence 
of matrix powers,” Int. J. Math. Educ. Sci. Technol., 1992, Vol. 23, no. 5, 
pp. 765-769. 
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The necessity of condition (a) is easily justified. For suppose that A is an 
eigenvalue of A such that A ¢ S. Let v be an eigenvector of A corresponding 
to A. Regarding v as an n x 1 matrix, we see that 


lim (A™v) = ( lim A") v= Lv 


m—- co m— co 


by Theorem 5.12, where L = lim A™. But lim (A”v) = lim (Av) 


m—- co m—- co m—- co 


diverges because lim X” does not exist. Hence if lim A” exists, then 
m— co 


m—co 
condition (a) of Theorem 5.13 must hold. 
Although we are unable to prove the necessity of condition (b) here, we 
consider an example for which this condition fails. Observe that the charac- 
teristic polynomial for the matrix 


1 1 
e=(0 3) 
is (t — 1)?, and hence B has eigenvalue A = 1 with multiplicity 2. It can 


easily be verified that dim(E,) = 1, so that condition (b) of Theorem 5.13 
is violated. A simple mathematical induction argument can be used to show 


that 
m_ {1 om 
on=(¢ 7) 


and therefore that im B™ does not exist. We see in Chapter 7 that if A 
is a matrix for which condition (b) fails, then A is similar to a matrix whose 
upper left 2 x 2 submatrix is precisely this matrix B. 

In most of the applications involving matrix limits, the matrix is diag- 
onalizable, and so condition (b) of Theorem 5.13 is automatically satisfied. 
In this case, Theorem 5.13 reduces to the following theorem, which can be 
proved using our previous results. 


Theorem 5.14. Let A € Mnxn(C) satisfy the following two conditions. 
(i) Every eigenvalue of A is contained in S. 
(ii) A is diagonalizable. 


Then lim A” exists. 


Proof. Since A is diagonalizable, there exists an invertible matrix Q such 
that Q-!AQ = D is a diagonal matrix. Suppose that 
Dep BON * ae.-- HG 
0: Ae Seen 0 
D=]|. . : 


Gb: 50) > sane Dies 
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Because 4, 2, -- 
each i, either A; = i or |A;| < 1. Thus 


m—co 


But since 
vy” 0 
0 v2” 
D”™ = . 
0 0 


the sequence D, D?, 
lim A™ 


m—- co 


by the corollary to Theorem 5.12. 


The technique for computing lim A” 
m— co 
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An are the eigenvalues of A, condition (i) requires that for 


1 ifA;=1 
0 otherwise. 


dn™ 


. converges to a limit L. Hence 


lim 1 (QDQ™ 


*)” = QLQ* 
| 


used in the proof of Theorem 5.14 


can be employed in actual computations, as we now illustrate. Let 


BIO BIN BIO 


a 
lI 
loo lo INI 


| 
AID pie AIG 


Using the methods in Sections 5.1 and 5.2, we obtain 


1 3-1 
Q=|-3 -2 1 and 
2 3-1 


such that Q-!AQ = D. Hence 


lim A™ = lim (QDQ7')™ 
ee eee 1 
= fa 229° a0) tae 
9 3 -1 m—CcoO 0 
t, - SF. ae 
E23 oe NG O26 
2 3 -1/\0 00 


= lim Qp™qQ7! 


lI 
oo. 0 
Al Oo Oo 


= (jim D") Q™ 


abe oo Sy ed 
Gary -0 at 49 
0 (%)" —5 3 7 
Be Ea vi. 
es ees ee 
ee a 20 2 
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Next, we consider an application that uses the limit of powers of a ma- 
trix. Suppose that the population of a certain metropolitan area remains 
constant but there is a continual movement of people between the city and 
the suburbs. Specifically, let the entries of the following matrix A represent 
the probabilities that someone living in the city or in the suburbs on January 
1 will be living in each region on January 1 of the next year. 


Currently Currently 
living in living in 
the city the suburbs 


Living next year in the city 0.90 0.02 _4 
Living next year in the suburbs 0.10 0.98 } 


For instance, the probability that someone living in the city (on January 1) 
will be living in the suburbs next year (on January 1) is 0.10. Notice that 
since the entries of A are probabilities, they are nonnegative. Moreover, the 
assumption of a constant population in the metropolitan area requires that 
the sum of the entries of each column of A be 1. 

Any square matrix having these two properties (nonnegative entries and 
columns that sum to 1) is called a transition matrix or a stochastic ma- 
trix. For an arbitrary n x n transition matrix M, the rows and columns 
correspond to n states, and the entry M;; represents the probability of mov- 
ing from state j to state 7 in one stage. 

In our example, there are two states (residing in the city and residing in 
the suburbs). So, for example, Ag; is the probability of moving from the 
city to the suburbs in one stage, that is, in one year. We now determine the 


City 
0.90 0.10 
City Suburbs 
0.98 
oe Suburbs 
Figure 5.3 


probability that a city resident will be living in the suburbs after 2 years. 
There are two different ways in which such a move can be made: remaining 
in the city for 1 year and then moving to the suburbs, or moving to the 
suburbs during the first year and remaining there the second year. (See 
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Figure 5.3.) The probability that a city dweller remains in the city for the 
first year is 0.90, whereas the probability that the city dweller moves to the 
suburbs during the first year is 0.10. Hence the probability that a city dweller 
stays in the city for the first year and then moves to the suburbs during the 
second year is the product (0.90)(0.10). Likewise, the probability that a city 
dweller moves to the suburbs in the first year and remains in the suburbs 
during the second year is the product (0.10)(0.98). Thus the probability that 
a city dweller will be living in the suburbs after 2 years is the sum of these 
products, (0.90)(0.10) + (0.10)(0.98) = 0.188. Observe that this number is 
obtained by the same calculation as that which produces (A?)21, and hence 
(A?)o1 represents the probability that a city dweller will be living in the 
suburbs after 2 years. In general, for any transition matrix M, the entry 
(M™),;; represents the probability of moving from state 7 to state 7 in m 
stages. 

Suppose additionally that 70% of the 2000 population of the metropolitan 
area lived in the city and 30% lived in the suburbs. We record these data as 
a column vector: 


Proportion of city dwellers 0.70\ _ P 
Proportion of suburb residents 0.30) — 


Notice that the rows of P correspond to the states of residing in the city and 
residing in the suburbs, respectively, and that these states are listed in the 
same order as the listing in the transition matrix A. Observe also that the 
column vector P contains nonnegative entries that sum to 1; such a vector is 
called a probability vector. In this terminology, each column of a transition 
matrix is a probability vector. It is often convenient to regard the entries of a 
transition matrix or a probability vector as proportions or percentages instead 
of probabilities, as we have already done with the probability vector P. 

In the vector AP, the first coordinate is the sum (0.90) (0.70)+(0.02) (0.30). 
The first term of this sum, (0.90)(0.70), represents the proportion of the 2000 
metropolitan population that remained in the city during the next year, and 
the second term, (0.02)(0.30), represents the proportion of the 2000 metropoli- 
tan population that moved into the city during the next year. Hence the first 
coordinate of AP represents the proportion of the metropolitan population 
that was living in the city in 2001. Similarly, the second coordinate of 


represents the proportion of the metropolitan population that was living in 
the suburbs in 2001. This argument can be easily extended to show that the 
coordinates of 

0.42032 


A?P = A(AP) = Ge 
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represent the proportions of the metropolitan population that were living 
in each location in 2002. In general, the coordinates of A™P represent the 
proportion of the metropolitan population that will be living in the city and 
suburbs, respectively, after m stages (m years after 2000). 

Will the city eventually be depleted if this trend continues? In view of 
the preceding discussion, it is natural to define the eventual proportion of 
the city dwellers and suburbanites to be the first and second coordinates, 
respectively, of lim A™P. We now compute this limit. It is easily shown 


that A is diagonalizable, and so there is an invertible matrix Q and a diagonal 
matrix D such that Q~!AQ = D. In fact, 


1 0 
ee d=, fee) 


Q= 


DO Dr 
Dl Dlr 


Therefore 


m—- co m—- co 


alo alr 
Do Be 


L= lim A” = lim QD™Q71=Q € i) Qui= 
Consequently 


lm A”P=LP= 


m—- co 


Da Br 


Thus, eventually, z of the population will live in the city and 2 will live in the 
suburbs each year. Note that the vector LP satisfies A(ZLP) = LP. Hence 
LP is both a probability vector and an eigenvector of A corresponding to 
the eigenvalue 1. Since the eigenspace of A corresponding to the eigenvalue 
1 is one-dimensional, there is only one such vector, and LP is independent 
of the initial choice of probability vector P. (See Exercise 15.) For example, 
had the 2000 metropolitan population consisted entirely of city dwellers, the 
limiting outcome would be the same. 

In analyzing the city-suburb problem, we gave probabilistic interpreta- 
tions of A? and AP, showing that A? is a transition matrix and AP is a 
probability vector. In fact, the product of any two transition matrices is a 
transition matrix, and the product of any transition matrix and probability 
vector is a probability vector. A proof of these facts is a simple corollary 
of the next theorem, which characterizes transition matrices and probability 
vectors. 


Theorem 5.15. Let M be annxn matrix having real nonnegative entries, 
let v be a column vector in R” having nonnegative coordinates, and let u € R” 
be the column vector in which each coordinate equals 1. Then 
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(a) M is a transition matrix if and only if M‘u = u; 
(b) v is a probability vector if and only if u'v = (1). 


Proof. Exercise. | 


Corollary. 

(a) The product of two n x n transition matrices is an n x n transition 
matrix. In particular, any power of a transition matrix is a transition 
matrix. 

(b) The product of a transition matrix and a probability vector is a prob- 
ability vector. 


Proof. Exercise. | 


The city-suburb problem is an example of a process in which elements of 
a set are each classified as being in one of several fixed states that can switch 
over time. In general, such a process is called a stochastic process. The 
switching to a particular state is described by a probability, and in general 
this probability depends on such factors as the state in question, the time 
in question, some or all of the previous states in which the object has been 
(including the current state), and the states that other objects are in or have 
been in. 

For instance, the object could be an American voter, and the state of the 
object could be his or her preference of political party; or the object could 
be a molecule of H2O, and the states could be the three physical states in 
which H2O can exist (solid, liquid, and gas). In these examples, all four of 
the factors mentioned above influence the probability that an object is in a 
particular state at a particular time. 

If, however, the probability that an object in one state changes to a differ- 
ent state in a fixed interval of time depends only on the two states (and not on 
the time, earlier states, or other factors), then the stochastic process is called 
a Markov process. If, in addition, the number of possible states is finite, 
then the Markov process is called a Markov chain. We treated the city— 
suburb example as a two-state Markov chain. Of course, a Markov process is 
usually only an idealization of reality because the probabilities involved are 
almost never constant over time. 

With this in mind, we consider another Markov chain. A certain com- 
munity college would like to obtain information about the likelihood that 
students in various categories will graduate. The school classifies a student 
as a sophomore or a freshman depending on the number of credits that the 
student has earned. Data from the school indicate that, from one fall semester 
to the next, 40% of the sophomores will graduate, 30% will remain sopho- 
mores, and 30% will quit permanently. For freshmen, the data show that 
10% will graduate by next fall, 50% will become sophomores, 20% will re- 
main freshmen, and 20% will quit permanently. During the present year, 
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50% of the students at the school are sophomores and 50% are freshmen. As- 
suming that the trend indicated by the data continues indefinitely, the school 
would like to know 


1. the percentage of the present students who will graduate, the percentage 
who will be sophomores, the percentage who will be freshmen, and the 
percentage who will quit school permanently by next fall; 

2. the same percentages as in item 1 for the fall semester two years hence; 
and 

3. the probability that one of its present students will eventually graduate. 


The preceding paragraph describes a four-state Markov chain with the 
following states: 


1. having graduated 

2. being a sophomore 

3. being a freshman 

4. having quit permanently. 


The given data provide us with the transition matrix 


1 04 01 0 
0 03 0.5 0 
A= lo. -o 620 0 
O° 0 Oe 4 


of the Markov chain. (Notice that students who have graduated or have quit 
permanently are assumed to remain indefinitely in those respective states. 
Thus a freshman who quits the school and returns during a later semester 
is not regarded as having changed states—the student is assumed to have 
remained in the state of being a freshman during the time he or she was not 
enrolled.) Moreover, we are told that the present distribution of students is 
half in each of states 2 and 3 and none in states 1 and 4. The vector 


0 
0.5 
i 0.5 


0 


that describes the initial probability of being in each state is called the initial 
probability vector for the Markov chain. 

To answer question 1, we must determine the probabilities that a present 
student will be in each state by next fall. As we have seen, these probabilities 
are the coordinates of the vector 


1 04 01 0 0 0.25 
0 0.3 05 0} [05 0.40 
AP =. 6:09 01 10:5 | > | Ot0 
0 03 02 1 0 0.25 
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Hence by next fall, 25% of the present students will graduate, 40% will be 
sophomores, 10% will be freshmen, and 25% will quit the school permanently. 
Similarly, 


1 04 0.1 O 0.25 0.42 

D5Se _ {0 03 0.5 0 0.40} | 0.17 
ee 0 O 0.2 0 0.10] | 0.02 
0 03 0.2 1 0.25 0.39 


provides the information needed to answer question 2: within two years 42% 
of the present students will graduate, 17% will be sophomores, 2% will be 
freshmen, and 39% will quit school. 

Finally, the answer to question 3 is provided by the vector LP, where 
L= lim A”. For the matrices 


m—- co 


1 4 190 1 0 00 
©. 27 40°O OOS. “0: 0 
OF 16: -@: “eco et PH lo 68 Ooo? 
OG <3. Te O° 0° 0:4 
we have Q~'AQ = D. Thus 
L= lim A™=Q( lim D™) Qu! 
f 2°22 4g 
1 4 19 0/100 0\f, 1 '5 g 
_fo -7 -40 o|{/0 0 0 0 or 
“lo 0 8 of{o0 0 0 offo o 20 
Or 3 AS ALON Ao a 3 29 
te <8, ety 
4 27 
i, st, eG 
16 a 0 0 
~l0 0 0 0 
3 29 
G2 
So 
4 27 59 
1 7 5 9 0 T13 
0 0 0 of fos 0 
HS Noir 103041 KS | A wor.| 3 
02 2 1 0 a 


59 


and hence the probability that one of the present students will graduate is 75. 
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In the preceding two examples, we saw that lim A” P, where A is the 


m—- oo 
transition matrix and P is the initial probability vector of the Markov chain, 
gives the eventual proportions in each state. In general, however, the limit of 
powers of a transition matrix need not exist. For example, if 


0 1 
w=(7 0): 
then lim M” does not exist because odd powers of M equal M and even 


m—- co 


powers of M equal J. The reason that the limit fails to exist is that con- 
dition (a) of Theorem 5.13 does not hold for M (—1 is an eigenvalue). In 
fact, it can be shown (see Exercise 20 of Section 7.2) that the only transition 
matrices A such that lim A™ does not exist are precisely those matrices for 


m—co 
which condition (a) of Theorem 5.13 fails to hold. 

But even if the limit of powers of the transition matrix exists, the compu- 
tation of the limit may be quite difficult. (The reader is encouraged to work 
Exercise 6 to appreciate the truth of the last sentence.) Fortunately, there is 
a large and important class of transition matrices for which this limit exists 
and is easily computed—this is the class of regular transition matrices. 


Definition. A transition matrix is called regular if some power of the 
matrix contains only positive entries. 


Example 2 


The transition matrix 
0.90 0.02 
0.10 0.98 


of the Markov chain used in the city-suburb problem is clearly regular because 
each entry is positive. On the other hand, the transition matrix 


1 04 0.1 0 
0 03 0.5 0 
= 0 O 02 0 
0 03 0.2 1 


of the Markov chain describing community college enrollments is not regular 
because the first column of A™ is 


ooor 


for any power m. 
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Observe that a regular transition matrix may contain zero entries. For 
example, 


0.9 0.5 0 
M= 0 0.5 0.4 
0.1 0 0.6 
is regular because every entry of M? is positive. 


The remainder of this section is devoted to proving that, for a regular 
transition matrix A, the limit of the sequence of powers of A exists and 
has identical columns. From this fact, it is easy to compute this limit. In 
the course of proving this result, we obtain some interesting bounds for the 
magnitudes of eigenvalues of any square matrix. These bounds are given in 
terms of the sum of the absolute values of the rows and columns of the matrix. 
The necessary terminology is introduced in the definitions that follow. 


Definitions. Let A € Myyn(C). For 1 < i,j <n, define p;(A) to be the 
sum of the absolute values of the entries of row i of A, and define v;(A) to be 
equal to the sum of the absolute values of the entries of column j of A. Thus 


pi(A) = 5° | Aay| fori =1,2,...n 
j=l 
and 
v;(A) = 5° | Ais! for 7 = 1,2,...n. 
i=1 


The row sum of A, denoted p(A), and the column sum of A, denoted v(A), 
are defined as 


p(A) = max{p;(A): 1<i<n} and v(A) = max{y;(A):1 <j <n}. 


Example 3 


For the matrix 


1 —i 3-41 
A= |-2+i 0 6 ; 
3 2 4 


pi(A) = 7, p2(A) = 6 + V5, p3(A) = 6, (A) = 4+ V5, v2(A) = 3, and 
v3(A) = 12. Hence p(A)=6+V5andr(A)=12. 
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Our next results show that the smaller of p(A) and v(A) is an upper 
bound for the absolute values of eigenvalues of A. In the preceding example, 
for instance, A has no eigenvalue with absolute value greater than 6 + V5. 

To obtain a geometric view of the following theorem, we introduce some 
terminology. For annxn matrix A, we define the ith Gerschgorin disk C; to 
be the disk in the complex plane with center A;; and radius r; = p;(A) —|Ai:|; 
that is, 


C; = {z EC: lz — Aji| = rj}. 


For example, consider the matrix 


_f1+2% 1 
=(4% 3) 


For this matrix, C, is the disk with center 1 + 27 and radius 1, and C% is the 
disk with center —3 and radius 2. (See Figure 5.4.) 


imaginary axis 
4 C, 
C2 2 
iL 
al t t 1 
real axis 
Figure 5.4 


Gershgorin’s disk theorem, stated below, tells us that all the eigenvalues 
of A are located within these two disks. In particular, we see that 0 is not an 
eigenvalue, and hence by Exercise 8(c) of section 5.1, A is invertible. 


Theorem 5.16 (Gerschgorin’s Disk Theorem). Let A € Mnyn(C). 
Then every eigenvalue of A is contained in a Gerschgorin disk. 


Proof. Let » be an eigenvalue of A with the corresponding eigenvector 
U1 


v2 


Un 
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Then v satisfies the matrix equation Av = Av, which can be written 
n 
j=l 


Suppose that vz is the coordinate of v having the largest absolute value; note 
that v, 4 0 because v is an eigenvector of A. 

We show that A lies in Cy, that is, |\ — Agg| < rg. For i = k, it follows 
from (2) that 


|Avg — Agk¥e| = So Ang? — Agkvr| = > Ani; 


j=l i#k 
< So lAxglluj] < So |Axsllonl 
xk xk 
= lvl S> | Agjl = lvelre- 
J#k 
Thus 
[ve \|A — Ankl S |vel res 
sO 
|A — Agg| < re 
because |vz| > 0. | 


Corollary 1. Let A be any eigenvalue of A € Mnxn(C). Then |A| < p(A). 
Proof. By Gerschgorin’s disk theorem, |\ — Agxz| < rp for some k. Hence 


[A] = |(A — Age) + Ane] <A — Akl + [Ane 
<r +|Arkl = px(A) < pA). i 
Corollary 2. Let \ be any eigenvalue of A € Myyn(C). Then 
IA < min{o(4), »(A)}. 
Proof. Since |A| < p(A) by Corollary 1, it suffices to show that |A| < (A). 
By Exercise 14 of Section 5.1, is an eigenvalue of A‘, and so |A| < p(A‘) 


by Corollary 2. But the rows of A* are the columns of A; consequently 
p(A‘) = v(A). Therefore |A| < (A). | 


The next corollary is immediate from Corollary 2. 
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Corollary 3. If X is an eigenvalue of a transition matrix, then |A| < 1. 
The next result asserts that the upper bound in Corollary 3 is attained. 


Theorem 5.17. Every transition matrix has 1 as an eigenvalue. 


Proof. Let A be an n x n transition matrix, and let u € R” be the column 
vector in which each coordinate is 1. Then A*u = u by Theorem 5.15, and 
hence u is an eigenvector of A* corresponding to the eigenvalue 1. But since 
A and At have the same eigenvalues, it follows that 1 is also an eigenvalue of 


A. | 


Suppose that A is a transition matrix for which some eigenvector corre- 
sponding to the eigenvalue 1 has only nonnegative coordinates. Then some 
multiple of this vector is a probability vector P as well as an eigenvector of 
A corresponding to eigenvalue 1. It is interesting to observe that if P is the 
initial probability vector of a Markov chain having A as its transition matrix, 
then the Markov chain is completely static. For in this situation, A’’P = P 
for every positive integer m; hence the probability of being in each state never 
changes. Consider, for instance, the city-suburb problem with 


P= 


Do De 


Theorem 5.18. Let A € Mnxn(C) be a matrix in which each entry is 
positive, and let \ be an eigenvalue of A such that |\| = p(A). Then X = p(A) 
and {u} is a basis for Ey, where u € C” is the column vector in which each 
coordinate equals 1. 


Proof. Let v be an eigenvector of A corresponding to A, with coordinates 
U1, U2,;--+,Un- Suppose that vz is the coordinate of v having the largest ab- 
solute value, and let b = |v;|. Then 


[Alb = [Allue] = [Ave] = |5 Ange,| < So Ange! 
j=l j=l 
= So lAgjllosl < $5 Agglb = pe(A)b < p(A)b. (3) 
j=l 


j=l 


Since |A| = p(A), the three inequalities in (3) are actually equalities; that is, 


(a) |S> Anjvs| = S° |Ansryl, 
j=l j=l 
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n n 


(b) So |Aaglleyl = So |Anglb, and 


j=l j=l 


(c) pr(A) = p(A). 
We see in Exercise 15(b) of Section 6.1 that (a) holds if and only if all 


the terms Ax;v; (j = 1,2,...,n) are nonnegative multiples of some nonzero 
complex number z. Without loss of generality, we assume that |z| = 1. Thus 
there exist nonnegative real numbers c),c2,...,C, such that 

Apjvj = CZ. (4) 


By (b) and the assumption that A;,; 4 0 for all k and j, we have 
\uj|=6 for j =1,2,...,n. (5) 
Combining (4) and (5), we obtain 


Cj Cj 
—2 z)= 2 for j= 1,2,...,n, 


b= Us| = 


and therefore by (4), we have v; = bz for all j. So 


V1 bz 
vg bz 

v= =|. | = bzu, 
Un bz 


and hence {u} is a basis for E). 

Finally, observe that all of the entries of Au are positive because the same 
is true for the entries of both A and u. But Au = Au, and hence A > 0. 
Therefore, \ = |A| = p(A). | 


Corollary 1. Let A € Mnxn(C) be a matrix in which each entry is 
positive, and let be an eigenvalue of A such that |X| = v(A). Then A = v(A), 
and the dimension of E, = 1. 


Proof. Exercise. i 


Corollary 2. Let A © Mnxn(C) be a transition matrix in which each 
entry is positive, and let \ be any eigenvalue of A other than 1. Then |A| < 1. 
Moreover, the eigenspace corresponding to the eigenvalue 1 has dimension 1. 


Proof. Exercise. | 


Our next result extends Corollary 2 to regular transition matrices and thus 
shows that regular transition matrices satisfy condition (a) of Theorems 5.13 
and 5.14. 
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Theorem 5.19. Let A be a regular transition matrix, and let X be an 
eigenvalue of A. Then 
(a) |Al<1. 
(b) If |A| =1, then A = 1, and dim(E)) = 1. 


Proof. Statement (a) was proved as Corollary 3 to Theorem 5.16. 

(b) Since A is regular, there exists a positive integer s such that A* has 
only positive entries. Because A is a transition matrix and the entries of 
A’ are positive, the entries of ASt! = A*%(A) are positive. Suppose that 
|AJ = 1. Then A* and \$*+ are eigenvalues of A* and A*‘t!, respectively, 
having absolute value 1. So by Corollary 2 to Theorem 5.18, \° = ASt! = 1. 
Thus A = 1. Let E, and E4 denote the eigenspaces of A and A’®, respectively, 
corresponding to A = 1. Then E, C E\ and, by Corollary 2 to Theorem 5.18, 
dim(E,) = 1. Hence E, = E4, and dim(E)) = 1. 


Corollary. Let A be a regular transition matrix that is diagonalizable. 
Then lim A”™ exists. 


m—- co 


The preceding corollary, which follows immediately from Theorems 5.19 
and 5.14, is not the best possible result. In fact, it can be shown that if A is 
a regular transition matrix, then the multiplicity of 1 as an eigenvalue of A is 
1. Thus, by Theorem 5.7 (p. 264), condition (b) of Theorem 5.13 is satisfied. 
So if A is a regular transition matrix, im A™ exists regardless of whether 


A is or is not diagonalizable. As with T heorem 5.13, however, the fact that 
the multiplicity of 1 as an eigenvalue of A is 1 cannot be paged at this time. 
Nevertheless, we state this result here (leaving the proof until Exercise 20 of 
Section 7.2) and deduce further facts about im A™ when A is a regular 


transition matrix. 


Theorem 5.20. Let A be ann x n regular transition matrix. Then 
) The multiplicity of 1 as an eigenvalue of A is 1. 
) lim A” exists. 
m—- oo 
(c) L= im. A™ is a transition matrix. 
) AL=LA=L. 
) The columns of L are identical. In fact, each column of L is equal to 
the unique probability vector v that is also an eigenvector of A corre- 
sponding to the eigenvalue 1. 


(f{) For any probability vector w, lim (A™w) = v. 


Proof. (a) See Exercise 20 of Section 7.2. 

(b) This follows from (a) and Theorems 5.19 and 5.13. 

(c) By Theorem 5.15, we must show that u'Z = u'. Now A™ is a transition 
matrix by the corollary to Theorem 5.15, so 


uvLb=u' lim A™= lim vw A™ = lim ul =v", 
m—oo m— co m— oo 
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and it follows that ZL is a transition matrix. 
(d) By Theorem 5.12, 


AL=A Jim A” = fin AA™ = Jim AM = 
Similarly, LA = L. 

(e) Since AL = L by (d), each column of L is an eigenvector of A cor- 
responding to the eigenvalue 1. Moreover, by (c), each column of L is a 
probability vector. Thus, by (a), each column of L is equal to the unique 
probability vector v corresponding to the eigenvalue 1 of A. 

(f) Let w be any probability vector, and set y= lim A”™w = Lw. Then 


m—- co 


y is a probability vector by the corollary to Theorem 5.15, and also Ay = 
ALw = Lw = y by (d). Hence y is also an eigenvector corresponding to the 
eigenvalue 1 of A. So y = v by (e). | 


Definition. The vector v in Theorem 5.20(e) is called the fixed prob- 
ability vector or stationary vector of the regular transition matrix A. 


Theorem 5.20 can be used to deduce information about the eventual dis- 
tribution in each state of a Markov chain having a regular transition matrix. 


Example 4 


A survey in Persia showed that on a particular day 50% of the Persians 
preferred a loaf of bread, 30% preferred a jug of wine, and 20% preferred 
“thou beside me in the wilderness.” A subsequent survey 1 month later 
yielded the following data: Of those who preferred a loaf of bread on the first 
survey, 40% continued to prefer a loaf of bread, 10% now preferred a jug of 
wine, and 50% preferred “thou”; of those who preferred a jug of wine on the 
first survey, 20% now preferred a loaf of bread, 70% continued to prefer a jug 
of wine, and 10% now preferred “thou”; of those who preferred “thou” on the 
first survey, 20% now preferred a loaf of bread, 20% now preferred a jug of 
wine, and 60% continued to prefer “thou.” 


Assuming that this trend continues, the situation described in the preced- 
ing paragraph is a three-state Markov chain in which the states are the three 
possible preferences. We can predict the percentage of Persians in each state 
for each month following the original survey. Letting the first, second, and 
third states be preferences for bread, wine, and “thou”, respectively, we see 
that the probability vector that gives the initial probability of being in each 
state is 
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and the transition matrix is 


0.40 0.20 0.20 
A= {0.10 0.70 0.20 
0.50 0.10 0.60 


The probabilities of being in each state m months after the original survey 
are the coordinates of the vector A™P. The reader may check that 


0.30 0.26 0.252 0.2504 
AP = | 0.30], A?7P =| 0.32], A? P= | 0.334], and A*P = | 0.3418 
0.40 0.42 0.414 0.4078 


Note the apparent convergence of A’ P. 


Since A is regular, the long-range prediction concerning the Persians’ pref- 
erences can be found by computing the fixed probability vector for A. This 
vector is the unique probability vector v such that (A — I)v = 0. Letting 


we see that the matrix equation (A — I)v = 0 yields the following system of 
linear equations: 


0.60v, + 0.20v2 + 0.20v3 = 0 
0.10v, — 0.30v2 + 0.20v3 = 0 
0.50v, + 0.10v2 — 0.40v3 = 0. 


It is easily shown that 


“I 


is a basis for the solution space of this system. Hence the unique fixed prob- 
ability vector for A is 


° : 7 0.25 
54748 | = | 0.35 

8 0.40 
5+7 


Thus, in the long run, 25% of the Persians prefer a loaf of bread, 35% prefer 
a jug of wine, and 40% prefer “thou beside me in the wilderness.” 
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Note that if 


5 0 —-3 
Q=|7 -1 -1], 
8 1 4 
then 
1 0 0 
Q1AQ=[0 0.5 0 
0 0 0.2 
So 


m—- co m—- co 


ooo 

ooo 
© 
L 


1 O 
lim A” =Q] lim [0 0.5 
0 0 


0.25 0.25 0.25 
0.35 0.35 0.35]. 
0.40 0.40 0.40 


Example 5 


Farmers in Lamron plant one crop per year—either corn, soybeans, or wheat. 
Because they believe in the necessity of rotating their crops, these farmers do 
not plant the same crop in successive years. In fact, of the total acreage on 
which a particular crop is planted, exactly half is planted with each of the 
other two crops during the succeeding year. This year, 300 acres of corn, 200 
acres of soybeans, and 100 acres of wheat were planted. 


The situation just described is another three-state Markov chain in which 
the three states correspond to the planting of corn, soybeans, and wheat, 
respectively. In this problem, however, the amount of land devoted to each 
crop, rather than the percentage of the total acreage (600 acres), is given. By 
converting these amounts into fractions of the total acreage, we see that the 
transition matrix A and the initial probability vector P of the Markov chain 
are 


9 12 300 1 

2 2 600 2 

1 1 200 1 
A=|s 9% 3] and P=] a0|=|3 
tA. <5 100 1 

2: 2 600 6 


The fraction of the total acreage devoted to each crop in m years is given by 

the coordinates of A™P, and the eventual proportions of the total acreage 

used for each crop are the coordinates of lim A™P. Thus the eventual 
m— co 
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amounts of land devoted to each crop are found by multiplying this limit by 
the total acreage; that is, the eventual amounts of land used for each crop 
are the coordinates of 600- lim A™P. 


m—- co 


Since A is a regular transition matrix, Theorem 5.20 shows that lim A™ 


m— co 


is a matrix L in which each column equals the unique fixed probability vector 
for A. It is easily seen that the fixed probability vector for A is 


1 
3 
i 
3 
ip 
3 
Hence 
1 1 ii 
3 3 3 
a5 ae ‘ll 
L=|]3 3 31]; 
ido1i ii 
3 3 3 
SO 
200 
600- lim AP =600LP = | 200 


Thus, in the long run, we expect 200 acres of each crop to be planted each 
year. (For a direct computation of 600- lim A™P, see Exercise 14.) @ 
m— co 
In this section, we have concentrated primarily on the theory of regular 
transition matrices. There is another interesting class of transition matrices 
that can be represented in the form 


(6 ¢) 


where J is an identity matrix and O is a zero matrix. (Such transition ma- 
trices are not regular since the lower left block remains O in any power of 
the matrix.) The states corresponding to the identity submatrix are called 
absorbing states because such a state is never left once it is entered. A 
Markov chain is called an absorbing Markov chain if it is possible to go 
from an arbitrary state into an absorbing state in a finite number of stages. 
Observe that the Markov chain that describes the enrollment pattern in a 
community college is an absorbing Markov chain with states 1 and 4 as its ab- 
sorbing states. Readers interested in learning more about absorbing Markov 
chains are referred to Introduction to Finite Mathematics (third edition) by 
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J. Kemeny, J. Snell, and G. Thompson (Prentice-Hall, Inc., Englewood Cliffs, 
N. J., 1974) or Discrete Mathematical Models by Fred S. Roberts (Prentice- 
Hall, Inc., Englewood Cliffs, N. J., 1976). 


An Application 


In species that reproduce sexually, the characteristics of an offspring with 
respect to a particular genetic trait are determined by a pair of genes, one 
inherited from each parent. The genes for a particular trait are of two types, 
which are denoted by G and g. The gene G represents the dominant char- 
acteristic, and g represents the recessive characteristic. Offspring with geno- 
types GG or Gg exhibit the dominant characteristic, whereas offspring with 
genotype gg exhibit the recessive characteristic. For example, in humans, 
brown eyes are a dominant characteristic and blue eyes are the correspond- 
ing recessive characteristic; thus the offspring with genotypes GG or Gg are 
brown-eyed, whereas those of type gg are blue-eyed. 

Let us consider the probability of offspring of each genotype for a male 
parent of genotype Gg. (We assume that the population under consideration 
is large, that mating is random with respect to genotype, and that the distri- 
bution of each genotype within the population is independent of sex and life 
expectancy.) Let 


P= 


32:40 '3 


denote the proportion of the adult population with genotypes GG, Gg, and 
gg, respectively, at the start of the experiment. This experiment describes a 
three-state Markov chain with the following transition matrix: 


Genotype of female parent 


GG Gg gg 
1 1 0 
Genotype GG 5 : ; 
of Gg 2 2 2|/=B 
offspring ge 0 1 i 


It is easily checked that B? contains only positive entries; so B is regular. 
Thus, by permitting only males of genotype Gg to reproduce, the proportion 
of offspring in the population having a certain genotype will stabilize at the 
fixed probability vector for B, which is 


Ale Nie Ale 
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Now suppose that similar experiments are to be performed with males of 
genotypes GG and gg. As already mentioned, these experiments are three- 
state Markov chains with transition matrices 


1 
A=10 
0 


oO NF Nile 


0 
1 
0 


and C= 


0 
1 


0 


NIP NIF © 


respectively. In order to consider the case where all male genotypes are per- 
mitted to reproduce, we must form the transition matrix M = pA+qB+rC, 
which is the linear combination of A, B, and C' weighted by the proportion 
of males of each genotype. Thus 


1 1 1 
pt+ 354 gp + 49 0 
M= or" oot out or Mod 
0 sq+4r sqt+r 


To simplify the notation, let a = p+ $4 and 6 = $d +r. (The numbers a and 
b represent the proportions of G and g genes, respectively, in the population.) 
Then 


a $a 0 
M=|0 5 al, 
0 $b b 


wherea+b=p+qtr=l1. 
Let p’, gq’, and r’ denote the proportions of the first-generation offspring 
having genotypes GG, Gg, and gg, respectively. Then 


I 
, an - 
¢ | =MP= | &+54+4r | = | 2ab 
r $bq + br b? 


In order to consider the effects of unrestricted matings among the first- 
generation offspring, a new transition matrix M must be determined based 
upon the distribution of first-generation genotypes. As before, we find that 


p! ts sq tp 1 tq 0 qa’ sal 0 
M = sq y! Sp! sq sr! pl sq _ b’ $ a! 
0 tq xe sr! $q +r! 0 3b! pb’ 
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where a! = p! + sq and b! = sq +r’. However 
/ 2 1 U 1 2 
a=a + 5(2ab) = a(a+b) =a and bi = 5 (2ab) +b = b(a+b) =b. 


Thus M = M ; so the distribution of second-generation offspring among 
the three genotypes is 


a a® + a7b a?(a +b) a? 
M(MP) = M?P = | a?b+ab+ ab? } = [{ ab(a+1+6) } = | 2ab 
ab? + b3 b?(a + b) b? 


= MP, 


the same as the first-generation offspring. In other words, MP is the fixed 
probability vector for M, and genetic equilibrium is achieved in the population 
after only one generation. (This result is called the Hardy-Weinberg law.) 
Notice that in the important special case that a = b (or equivalently, that 
p=vr), the distribution at equilibrium is 


1 
az 4 

MP = | 2ab| = | 4 
b? ue 

4 

EXERCISES 


1. Label the following statements as true or false. 


(a) IfAEM,xn(C) and lim A™ = L, then, for any invertible matrix 


QE Maxn(C), we have lim QA™Q71! = QLQ-}. 
(b) If 2 is an eigenvalue of A € Myxn(C), then lim A” does not 


exist. 
(c) Any vector 


such that x1 + 22 +---+2, = 1 is a probability vector. 
(d) The sum of the entries of each row of a transition matrix equals 1. 
(e) The product of a transition matrix and a probability vector is a 
probability vector. 
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(f) Let z be any complex number such that |z| < 1. Then the matrix 


1 z -l 
z 1 1 
—1 1 Zz 


does not have 3 as an eigenvalue. 
(g) Every transition matrix has 1 as an eigenvalue. 
(h) No transition matrix can have —1 as an eigenvalue. 
(i) If A is a transition matrix, then lim A™ exists. 


m—- co 


(j) If A is a regular transition matrix, then lim A™ exists and has 


m—- co 


rank 1. 


Determine whether lim A” exists for each of the following matrices 


m—- co 


A, and compute the limit if it exists. 
0.1 0.7 —14 0.8 0.4 0.7 
(a) fe i (b) (a 7 (<) te 1) 


@(ts 3a) CE 3)  T8) 


-18 0 -1.4 3.4 -0.2 0.8 
(g) |-5.6 1 —2.8 (h) 3.9 18 £1.38 
28 0 24 16.5 —2.0 —4.5 
-$-% 46 £+5% 
(i) | 142) -3i -1-4% 
-1-2i 4i 14+ 5i 
26 +1 28 — 41 
28 
3 3 
; T+ 20 O+¢ 
7-21 
(i) F F i 
13 + 62 5+62 35-20% 
6 6 6 
Prove that if A;, Ao,... is a sequence of n x p matrices with complex 


entries such that lim A» = L, then lim (A,,)' = L'. 
m—co m— co 


Prove that if A € Mnxn(C) is diagonalizable and L = lim A” exists, 
then either L = I, or rank(L) <n. 
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5. 


Find 2 x 2 matrices A and B having real entries such that lim A”, 


lim B™, and lim (AB) all exist, but 


m—- co 


lim (AB)” 4( lim A™)( lim B™). 


m—- co m—- co m— co 


A hospital trauma unit has determined that 30% of its patients are 
ambulatory and 70% are bedridden at the time of arrival at the hospital. 
A month after arrival, 60% of the ambulatory patients have recovered, 
20% remain ambulatory, and 20% have become bedridden. After the 
same amount of time, 10% of the bedridden patients have recovered, 
20% have become ambulatory, 50% remain bedridden, and 20% have 
died. Determine the percentages of patients who have recovered, are 
ambulatory, are bedridden, and have died 1 month after arrival. Also 
determine the eventual percentages of patients of each type. 


A player begins a game of chance by placing a marker in box 2, marked 
Start. (See Figure 5.5.) A die is rolled, and the marker is moved one 
square to the left if a 1 or a 2 is rolled and one square to the right if a 
3, 4, 5, or 6 is rolled. This process continues until the marker lands in 
square 1, in which case the player wins the game, or in square 4, in which 
case the player loses the game. What is the probability of winning this 
game? Hint: Instead of diagonalizing the appropriate transition matrix 


Win | Start Lose 
1 2 3 4 
Figure 5.5 


A, it is easier to represent e2 as a linear combination of eigenvectors of 
A and then apply A” to the result. 


Which of the following transition matrices are regular? 


0:2: 0.30.5 0.5 0 1 0.5 0 0 
(a) [0.3 0.2 0.5 (b) |05 0 0 (c)}05 0 1 
0505 6 0 1 0 O40 36) 

% 0 0 
Oo G2 : 1 0 0 
(d) {05 1 0 (e)| 3 1 0 (f) |0 0.7 0.2 
0 0 0 191 0 0.3 0.8 
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0 4 00 4+ 7 00 
+ 00 0 4+ 7 00 
(2) }1 1 4 9 (hs). a> ae <6 
4+ 7% 01 ae se 0 1 


Compute lim A” if it exists, for each matrix A in Exercise 8. 
m—- oo 


Each of the matrices that follow is a regular transition matrix for a 
three-state Markov chain. In all cases, the initial probability vector is 


0.3 
0.3 
0.4 


P= 


For each transition matrix, compute the proportions of objects in each 
state after two stages and the eventual proportions of objects in each 
state by determining the fixed probability vector. 


0.6 01 0.1 0.8 01 0.2 0.9 01 0.1 
(a) [0.1 09 02) (b) [01 08 02) (c) [01 06 O1 
03 0.07 0.1 01 06 0 03 08 
0.4 0.2 02 0.5 0.3 0.2 06 0 04 
(d) {01 0.7 02] (e) [02 05 03] (f) [02 08 0.2 
0.5 01 06 0.3 0.2 0.5 0.2 0.2 04 


In 1940, a county land-use survey showed that 10% of the county land 
was urban, 50% was unused, and 40% was agricultural. Five years later, 
a follow-up survey revealed that 70% of the urban land had remained 
urban, 10% had become unused, and 20% had become agricultural. 
Likewise, 20% of the unused land had become urban, 60% had remained 
unused, and 20% had become agricultural. Finally, the 1945 survey 
showed that 20% of the agricultural land had become unused while 
80% remained agricultural. Assuming that the trends indicated by the 
1945 survey continue, compute the percentages of urban, unused, and 
agricultural land in the county in 1950 and the corresponding eventual 
percentages. 


A diaper liner is placed in each diaper worn by a baby. If, after a 
diaper change, the liner is soiled, then it is discarded and replaced by a 
new liner. Otherwise, the liner is washed with the diapers and reused, 
except that each liner is discarded and replaced after its third use (even 
if it has never been soiled). The probability that the baby will soil any 
diaper liner is one-third. If there are only new diaper liners at first, 
eventually what proportions of the diaper liners being used will be new, 
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13. 


14. 


15. 


16. 


17. 


18. 


19. 


once used, and twice used? Hint: Assume that a diaper liner ready for 
use is in one of three states: new, once used, and twice used. After its 
use, it then transforms into one of the three states described. 


In 1975, the automobile industry determined that 40% of American car 
owners drove large cars, 20% drove intermediate-sized cars, and 40% 
drove small cars. A second survey in 1985 showed that 70% of the large- 
car owners in 1975 still owned large cars in 1985, but 30% had changed 
to an intermediate-sized car. Of those who owned intermediate-sized 
cars in 1975, 10% had switched to large cars, 70% continued to drive 
intermediate-sized cars, and 20% had changed to small cars in 1985. 
Finally, of the small-car owners in 1975, 10% owned intermediate-sized 
cars and 90% owned small cars in 1985. Assuming that these trends 
continue, determine the percentages of Americans who own cars of each 
size in 1995 and the corresponding eventual percentages. 


Show that if A and P are as in Example 5, then 


Tm Tm+1 Tm+1 
m 
A™ = Tm+1 Tm Tm+1 ]> 


Tm+1 Tm +1 Tm 


where 
1 (-1)™ 
lm = 3 E T Qm-1 | 
Deduce that 
_j)m 
200 + ( L) (100) 
300 2 
600(A™” P) = A” | 200} = 200 
100 (—1)™41 


200 + J, — (100) 


Prove that if a 1-dimensional subspace W of R” contains a nonzero vec- 
tor with all nonnegative entries, then W contains a unique probability 
vector. 


Prove Theorem 5.15 and its corollary. 
Prove the two corollaries of Theorem 5.18. 
Prove the corollary of Theorem 5.19. 


Suppose that M and M’ are n x n transition matrices. 
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(a) Prove that if M is regular, N is any n x n transition matrix, and 
c is a real number such that 0 <c< 1, then cM+(1-—c)Nisa 
regular transition matrix. 

(b) Suppose that for all 7,7, we have that M/; > 0 whenever Mj; > 0. 
Prove that there exists a transition matrix N and a real number c 
with 0 <c<1such that M’=cM+(1-—o)N. 

(c) Deduce that if the nonzero entries of M and M’ occur in the same 
positions, then M is regular if and only if M’ is regular. 


The following definition is used in Exercises 20-24. 


Definition. For A € Mjxn(C), define e4 = lim B,,, where 


m—- co 


3! 


and B, is the mth partial sum of this series. (Note the analogy with the 
power series 


which is valid for all complex numbers a.) 


20. Compute e? and e!, where O and J denote the n x n zero and identity 
matrices, respectively. 


21. Let P-'AP = D bea diagonal matrix. Prove that e4 = Pe? P~1. 


22. Let A € Mnxn(C) be diagonalizable. Use the result of Exercise 21 to 
show that e4 exists. (Exercise 21 of Section 7.2 shows that e4 exists 
for every A € Maxn(C).) 


23. Find A,B € Mgx2(R) such that e4e? 4 e4+F, 


24. Prove that a differentiable function «: R — R” is a solution to the 
system of differential equations defined in Exercise 15 of Section 5.2 if 
and only if 2(t) = e’4v for some v € R”, where A is defined in that 
exercise. 


Sec. 5.4 Invariant Subspaces and the Cayley-Hamilton Theorem 313 


5.4 INVARIANT SUBSPACES AND THE CAYLEY-HAMILTON 
THEOREM 


In Section 5.1, we observed that if v is an eigenvector of a linear operator 
T, then T maps the span of {v} into itself. Subspaces that are mapped into 
themselves are of great importance in the study of linear operators (see, e.g., 
Exercises 28-32 of Section 2.1). 


Definition. Let T be a linear operator on a vector space V. A subspace 
W of V is called a T-invariant subspace of V if T(W) C W, that is, if 
T(v) € W for allv € W. 


Example 1 


Suppose that T is a linear operator on a vector space V. Then the following 
subspaces of V are T-invariant: 


. {0} 
.V 
R(T) 


N(T) 
5. Ey, for any eigenvalue A of T. 


Bm wnN kr 


The proofs that these subspaces are T-invariant are left as exercises. (See 
Exercise 3.) @ 


Example 2 
Let T be the linear operator on R? defined by 


T(a, b,c) = (a+b,b+c,0). 


Then the xy-plane = {(z,y,0): 7,y € R} and the z-axis = {(x,0,0): « € R} 
are T-invariant subspaces of R?. 


Let T be a linear operator on a vector space V, and let x be a nonzero 
vector in V. The subspace 


W = span({a, T(x), T?(x),...}) 


is called the T-cyclic subspace of V generated by «. It is a simple matter 
to show that W is T-invariant. In fact, W is the “smallest” T-invariant sub- 
space of V containing x. That is, any T-invariant subspace of V containing x 
must also contain W (see Exercise 11). Cyclic subspaces have various uses. 
We apply them in this section to establish the Cayley—Hamilton theorem. In 
Exercise 31, we outline a method for using cyclic subspaces to compute the 
characteristic polynomial of a linear operator without resorting to determi- 
nants. Cyclic subspaces also play an important role in Chapter 7, where we 
study matrix representations of nondiagonalizable linear operators. 
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Example 3 
Let T be the linear operator on R? defined by 
T(a, b,c) = (—b+c,a+e,3c). 
We determine the T-cyclic subspace generated by e; = (1,0,0). Since 
T(e1) = T(1,0,0) = (0, 1,0) = eg 
and 
T*(e1) = T(T(e1)) = T(e2) = (-1,0,0) = —e1, 
it follows that 


span({e1, T(e), T?(e1),...}) = span({e1, e2}) = {(s,t,0): s,t€ R}. 


Example 4 


Let T be the linear operator on P(R) defined by T(f(x)) = f’(x). Then the 
T-cyclic subspace generated by x? is span({a?,27,2})=P2(R). 


The existence of a T-invariant subspace provides the opportunity to define 
a new linear operator whose domain is this subspace. If T is a linear operator 
on V and W is a T-invariant subspace of V, then the restriction Tw of T to 
W (see Appendix B) is a mapping from W to W, and it follows that Tw is 
a linear operator on W (see Exercise 7). As a linear operator, Tw inherits 
certain properties from its parent operator T. The following result illustrates 
one way in which the two operators are linked. 


Theorem 5.21. Let T be a linear operator on a finite-dimensional vector 
space V, and let W be a T-invariant subspace of V. Then the characteristic 
polynomial of Tw divides the characteristic polynomial of T. 


Proof. Choose an ordered basis 7 = {v1,v2,-..,U%} for W, and extend it 
to an ordered basis 3 = {v1,U2,.--,Uk;Uk41;---;Un} for V. Let A = [T]¢ and 
B, = [Tw]y. Then, by Exercise 12, A can be written in the form 


_ (Bi Be 
1=(% %). 
Let f(t) be the characteristic polynomial of T and g(t) the characteristic 


polynomial of Tw. Then 


By, -tl B 
f(t) = det(A - tIn) = det ( : O t Bs 3 = = g(t): det (Bs _ tIn—k) 


by Exercise 21 of Section 4.3. Thus g(t) divides f(t). | 
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Example 5 
Let T be the linear operator on R* defined by 


T(a,b, c,d) = (a+b+2c—d,b+d,2c—d,c+d), 


and let W = {(t,s,0,0): ¢,s © R}. Observe that W is a T-invariant subspace 
of R* because, for any vector (a, b,0,0) € R*, 


T(a, b, 0, 0) = (a + b, b, 0, 0) € W. 


Let 7 = {e1,e2}, which is an ordered basis for W. Extend ¥ to the standard 
ordered basis 3 for R*+. Then 


1 12 -1 

1 0 1 0 1 

By = [Twl+ = ({ i) and A= [T]e => (0° 2 1 
0 0 1 1 


in the notation of Theorem 5.21. Let f(t) be the characteristic polynomial of 
T and g(t) be the characteristic polynomial of Tw. Then 


1-t 1 2-t -l 
=aet (15 i) act ( 1 =) 
2-t -1 
= a(t) det ( 1 =, 4 


In view of Theorem 5.21, we may use the characteristic polynomial of Tw 
to gain information about the characteristic polynomial of T itself. In this re- 
gard, cyclic subspaces are useful because the characteristic polynomial of the 
restriction of a linear operator T to a cyclic subspace is readily computable. 


Theorem 5.22. Let T be a linear operator on a finite-dimensional vector 
space V, and let W denote the T-cyclic subspace of V generated by a nonzero 
vector v € V. Let k = dim(W). Then 

(a) {v, T(v), T?(v),... , T*71(v)} is a basis for W. 


(b) Ifagu+ay,T(v)+---+ap_1T*!(v) +T*(v) = 0, then the characteristic 
polynomial of Tw is f(t) = (—1)*(ao + ait +--+ + ax_it*-! +t). 
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Proof. (a) Since v 4 0, the set {v} is linearly independent. Let j be the 
largest positive integer for which 


B= {v,T(v),...,T91(v)} 


is linearly independent. Such a 7 must exist because V is finite-dimensional. 
Let Z = span(3). Then @ is a basis for Z. Furthermore, T/(v) € Z by 
Theorem 1.7 (p. 39). We use this information to show that Z is a T-invariant 
subspace of V. Let w € Z. Since w is a linear combination of the vectors of 
2, there exist scalars bo, b1,...,6;~-1 such that 


w = bov + bi T(v) +++ +.bj-1T?71(v), 
and hence 
T(w) = bo T(v) + 1 T?(v) + +++ + bj-1T?(v). 


Thus T(w) is a linear combination of vectors in Z, and hence belongs to Z. 
So Z is T-invariant. Furthermore, v € Z. By Exercise 11, W is the smallest 
T-invariant subspace of V that contains v, so that W C Z. Clearly, Z C W, 
and so we conclude that Z = W. It follows that @ is a basis for W, and 
therefore dim(W) = j. Thus j = k. This proves (a). 

(b) Now view G (from (a)) as an ordered basis for W. Let ao, a1,...,@%—1 
be the scalars such that 


agu + a, T(v) +++» + ag_1T* 1 (v) + TR(v) = 0. 


Observe that 


0 O 0 —apo 
1 0 --. 0  -a; 
[Twle=|. . 
0 0 +--+ 1 -—apey 
which has the characteristic polynomial 
f(t) = ( 1)* (ap Fayt+-:-4 Gung’ 4 t*) 


by Exercise 19. Thus f(t) is the characteristic polynomial of Tw, proving (b). 


Example 6 


Let T be the linear operator of Example 3, and let W = span({e1, e2}), the 
T-cyclic subspace generated by e;. We compute the characteristic polyno- 
mial f(t) of Tw in two ways: by means of Theorem 5.22 and by means of 
determinants. 
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(a) By means of Theorem 5.22. From Example 3, we have that {e1, e2} is 
a cycle that generates W, and that T?(e,) = —e;. Hence 


le; + OT(e,) + T?(e1) = 0. 
Therefore, by Theorem 5.22(b), 
f(t) = (-1)° 0 + Ot +P) SP? +41, 


(b) By means of determinants. Let 3 = {e1,e2}, which is an ordered basis 
for W. Since T(e,) = eg and T(e2) = —e1, we have 


Twle= (7 ~4) 


and therefore, 


The Cayley—Hamilton Theorem 


As an illustration of the importance of Theorem 5.22, we prove a well- 
known result that is used in Chapter 7. The reader should refer to Ap- 
pendix E for the definition of f(T), where T is a linear operator and f(x) is 
a polynomial. 


Theorem 5.23 (Cayley—Hamilton). Let T be a linear operator on a 
finite-dimensional vector space V, and let f(t) be the characteristic polyno- 
mial of T. Then f(T) = To, the zero transformation. That is, T “satisfies” 
its characteristic equation. 


Proof. We show that f(T)(v) = 0 for all vu € V. This is obvious if v = 0 
because f(T) is linear; so suppose that v 4 0. Let W be the T-cyclic subspace 
generated by v, and suppose that dim(W) = k. By Theorem 5.22(a), there 
exist scalars ag, @1,...,@,—1 such that 


agv + ay T(v) +++ + ag_1T 1 (v) + T*(v) = 0. 
Hence Theorem 5.22(b) implies that 
g(t) = (—1)*(ap + ayt + +++ + ag_it®-} + t*) 


is the characteristic polynomial of Tw. Combining these two equations yields 


g(T)(v) = (—1)*(aol + a1T + +++ + ap T* 1 + T*)(v) = 0. 


By Theorem 5.21, g(t) divides f(t); hence there exists a polynomial q(t) such 
that f(t) = g(t)g(t). So 


F(T)(v) = aT) 9(T)(v) = a(T)(9(T) (&)) = a(T) (0) = 8. i 
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Example 7 
Let T be the linear operator on R? defined by T(a,b) = (a+ 2b, —2a+), and 


let 6 = {e1,e2}. Then 
1 2 
a=(-9 i): 


where A = [T]g. The characteristic polynomial of T is, therefore, 
i-t 2 ; 
f(t) = det(A — tI) = det eee pre oe —2t+65. 


It is easily verified that To = f(T) = T? — 2T +5l. Similarly, 
2 oe eee 5 0 
jayewrasst-(2 Aa (F 489 
0 0 
=: ( i) oe 


Example 7 suggests the following result. 


Corollary (Cayley-Hamilton Theorem for Matrices). Let A be 
an n X n matrix, and let f(t) be the characteristic polynomial of A. Then 
f(A) =O, then x n zero matrix. 


Proof. See Exercise 15. | 


Invariant Subspaces and Direct Sums** 


It is useful to decompose a finite-dimensional vector space V into a direct 
sum of as many T-invariant subspaces as possible because the behavior of T 
on V can be inferred from its behavior on the direct summands. For example, 
T is diagonalizable if and only if V can be decomposed into a direct sum 
of one-dimensional T-invariant subspaces (see Exercise 36). In Chapter 7, 
we consider alternate ways of decomposing V into direct sums of T-invariant 
subspaces if T is not diagonalizable. We proceed to gather a few facts about 
direct sums of T-invariant subspaces that are used in Section 7.4. The first 
of these facts is about characteristic polynomials. 


Theorem 5.24. Let T be a linear operator on a finite-dimensional vector 
space V, and suppose that V = W, ® W2 @--- ® Wg, where W; is a T- 
invariant subspace of V for each i (1 < i < k). Suppose that f(t) is the 
characteristic polynomial of Tw, (1 <i <k). Then fi(t)- fo(t)---++fx(t) is 
the characteristic polynomial of T. 


3This subsection uses optional material on direct sums from Section 5.2. 
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Proof. The proof is by mathematical induction on k. In what follows, f(¢) 
denotes the characteristic polynomial of T. Suppose first that k = 2. Let (1 
be an ordered basis for W;, G2 an ordered basis for W2, and 6 = (; U Go. 
Then @ is an ordered basis for V by Theorem 5.10(d) (p. 276). Let A = [T]g, 
B, = [Tw,]s,, and Bo = [Tw,]g,. By Exercise 34, it follows that 


$B, -O 
A=(3 ba): 


where O and O’ are zero matrices of the appropriate sizes. Then 
f(t) = det(A — tl) = det(B, — tl)- det(Bo — tl) = fi(t)- fo(t) 


as in the proof of Theorem 5.21, proving the result for k = 2. 
Now assume that the theorem is valid for s—1 summands, where k—1 > 2, 
and suppose that V is a direct sum of k subspaces, say, 


V=W, OW28---OWe. 


Let W = W,+W2+---+W,_1. It is easily verified that W is T-invariant and 
that V = W@ Wg. So by the case for k = 2, f(t) = g(t)- f(t), where g(t) is 
the characteristic polynomial of Tw. Clearly W = W, @W20---®W,-_1, and 
therefore g(t) = fi(t)- fo(t)-----fx-1(t) by the induction hypothesis. We 
conclude that f(t) = g(t): fe(t) = filt): fo(t)+ +++ + fe(t). | 


As an illustration of this result, suppose that T is a diagonalizable lin- 
ear operator on a finite-dimensional vector space V with distinct eigenvalues 
Ai, A2,---,;Ax- By Theorem 5.11 (p. 278), V is a direct sum of the eigenspaces 
of T. Since each eigenspace is T-invariant, we may view this situation in the 
context of Theorem 5.24. For each eigenvalue ;, the restriction of T to Ey, 
has characteristic polynomial (A; — t)’’, where m,; is the dimension of E),. 
By Theorem 5.24, the characteristic polynomial f(t) of T is the product 


F(t) = Ar — 1) (ra — 1) Ag — ty. 


It follows that the multiplicity of each eigenvalue is equal to the dimension 
of the corresponding eigenspace, as expected. 


Example 8 
Let T be the linear operator on R* defined by 


T(a, b,c, d) = (2a—b,a+b,c—d,c+d), 


and let W; = {(s,t,0,0): s,f © R} and W2 = {(0,0,s,t): s,t € R}. Notice 
that W, and W, are each T-invariant and that R* = W; @ W2. Let 6; = 
{e1,€2}, G2 = {e3,e4}, and @ = 61 U Bo = {e1,€2,€3,e4}. Then (; is an 
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ordered basis for W1, (2 is an ordered basis for W2, and ( is an ordered basis 
for R4. Let A= [T], By = [Tw,] 61; and Bo = [Two] ao: Then 


2 -1 1 -l 
a= (j ae B= (j a 


and 
2 -1 0 O 
au({B P)_}i 1:0 0 
~\O Bs) {0 0 1 -l 
0 01 1 


Let f(t), fi(t), and fo(t) denote the characteristic polynomials of T, Tw,, 
and Tw,, respectively. Then 


f(t) = det(A — tI) = det(B, — tl)- det(Bz — tl) = fi(t)-fo(t). ¢ 
The matrix A in Example 8 can be obtained by joining the matrices By, 
and B 2 in the manner explained in the next definition. 


Definition. Let By © Mnyxm(F), and let By € Mnyn(F). We define the 
direct sum of B, and Bz, denoted B, © Ba, as the (m+n) x (m+n) matrix 
A such that 


(Bi)i; for 1 < 1.9 <m 
Ai; = (Ba Gaiam) form +1 < a9 < n+m 
otherwise. 
If By, Bo,..., By are square matrices with entries from F’, then we define the 
direct sum of B,, Bo,..., By recursively by 


B, © By ®:+: © By = (Bi © Bo ©: +: © Bu_1) © By. 


If A= B, 6 By ®---@ By, then we often write 


B, O -- O 
O By +--+ O 
O O -:: Bp 
Example 9 
Let 


& 
a 

lI 
toe nes 

a 

bo 
ae 
& 
bo 

| 
on. 
w 
al 
© 

=) 
a 
& 
w 

II 
PRR 
PD bw 
wR 
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Then 
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B, @ Bz @ B3 = 


oo coc OF Fe 
ooo O}F lb 
cooluloo 
PRR OOo 
FNNI OOO 
FE wor) Oo oOo 


The final result of this section relates direct sums of matrices to direct 
sums of invariant subspaces. It is an extension of Exercise 34 to the case 


k > 2. 


Theorem 5.25. Let T be a linear operator on a finite-dimensional vector 


space V, 


and let W1,Wo2,...,W, be T-invariant subspaces of V such that 


V=W, @W2@---OWs,. For each i, let 3; be an ordered basis for W;, and 
let 6 = By U BgU---UfB,. Let A= [T]e and B; = [Tw]; fori = 1,2,...,k. 


Proof. See Exercise 35. | 


EXERCISES 


1. Label the following statements as true or false. 


(a) 
(b) 


(c) 


(d) 


(e) 
(f) 
(g) 


There exists a linear operator T with no T-invariant subspace. 

If T is a linear operator on a finite-dimensional vector space V and 
W is a T-invariant subspace of V, then the characteristic polyno- 
mial of Tw divides the characteristic polynomial of T. 

Let T be a linear operator on a finite-dimensional vector space V, 
and let v and w be in V. If W is the T-cyclic subspace generated 
by v, W’ is the T-cyclic subspace generated by w, and W = W’, 
then v = w. 

If T is a linear operator on a finite-dimensional vector space V, 
then for any v € V the T-cyclic subspace generated by v is the 
same as the T-cyclic subspace generated by T(v). 

Let T be a linear operator on an n-dimensional vector space. Then 
there exists a polynomial g(t) of degree n such that g(T) = To. 
Any polynomial of degree n with leading coefficient (—1)” is the 
characteristic polynomial of some linear operator. 

If T is a linear operator on a finite-dimensional vector space V, and 
if V is the direct sum of k T-invariant subspaces, then there is an 
ordered basis @ for V such that [T]g is a direct sum of k matrices. 
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For each of the following linear operators T on the vector space V, 
determine whether the given subspace W is a T-invariant subspace of 


V. 

(a) V=P3(R), T(f(x)) = f'(z), and W = P2(R) 

(b) V=P(R), T(f(x)) = xf (x), and W = P2(R) 

(c) V=R’°, T(a,b,c) =(a+b+c,a+b+c,a+b+o), and 
W = {(t,t,t): te R} 

(d) V=C((0,1)), TIF) = [Jo F(@) de] t, and 
W={f eV: f(t) =at+ 0 for some a and b} 


(c) V = Maxa(R), T(A) = (7 i) A, and W = {Ae€V: At= A} 


Let T be a linear operator on a finite-dimensional vector space V. Prove 
that the following subspaces are T-invariant. 

(a) {0} and V 

(b) N(T) and R(T) 

(c) E,, for any eigenvalue \ of T 


Let T be a linear operator on a vector space V, and let W be a T- 
invariant subspace of V. Prove that W is g(T)-invariant for any poly- 
nomial g(t). 


Let T be a linear operator on a vector space V. Prove that the inter- 
section of any collection of T-invariant subspaces of V is a T-invariant 
subspace of V. 


For each linear operator T on the vector space V, find an ordered basis 
for the T-cyclic subspace generated by the vector z. 

(a) V=R’%, T(a,b,c,d) =(a+b,b—c,a+c,a+d), and z =e). 

(b) V=Ps(R), T(f(a)) = f"(w), and z= 2°. 

(c) V=Moxye2(R), T(A) = A’, and z = 


(d) V =Moxa(R), T(A) = (; 5) Amd y= e a 


Prove that the restriction of a linear operator T to a T-invariant sub- 
space is a linear operator on that subspace. 


Let T be a linear operator on a vector space with a T-invariant subspace 
W. Prove that if v is an eigenvector of Tw with corresponding eigenvalue 
A, then the same is true for T. 


For each linear operator T and cyclic subspace W in Exercise 6, compute 
the characteristic polynomial of Tw in two ways, as in Example 6. 


Sec 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 
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For each linear operator in Exercise 6, find the characteristic polynomial 
f(t) of T, and verify that the characteristic polynomial of Tw (computed 
in Exercise 9) divides f(t). 


Let T be a linear operator on a vector space V, let v be a nonzero vector 
in V, and let W be the T-cyclic subspace of V generated by v. Prove 
that 


(a) W is T-invariant. 
(b) Any T-invariant subspace of V containing v also contains W. 


B, Be 


Prove that A = & Bs 


) in the proof of Theorem 5.21. 

Let T be a linear operator on a vector space V, let v be a nonzero vector 
in V, and let W be the T-cyclic subspace of V generated by v. For any 
w € V, prove that w € W if and only if there exists a polynomial g(t) 
such that w = g(T)(v). 


Prove that the polynomial g(t) of Exercise 13 can always be chosen so 
that its degree is less than or equal to dim(W). 


Use the Cayley-Hamilton theorem (Theorem 5.23) to prove its corol- 
lary for matrices. Warning: If f(t) = det(A — tJ) is the characteristic 
polynomial of A, it is tempting to “prove” that f(A) = O by saying 
“f(A) = det(A — AI) = det(O) = 0.” But this argument is nonsense. 
Why? 

Let T be a linear operator on a finite-dimensional vector space V. 


(a) Prove that if the characteristic polynomial of T splits, then so 
does the characteristic polynomial of the restriction of T to any 
T-invariant subspace of V. 

(b) Deduce that if the characteristic polynomial of T splits, then any 
nontrivial T-invariant subspace of V contains an eigenvector of T. 


Let A be an n x n matrix. Prove that 
dim(span({J,, A, A?,...})) <n. 
Let A be an n X n matrix with characteristic polynomial 
F(t) = (-1)"t* +eanait™ * +--+ ait + a0. 


(a) Prove that A is invertible if and only if ao 4 0. 
(b) Prove that if A is invertible, then 


A~* = (-1/ao)[(-1)" A"? + an A"? + + + ay]. 
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20. 


21. 


22. 


23. 


24. 
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(c) Use (b) to compute A~? for 


Le 2 1 
A=|0 2 3 
0 0 -1 
Let A denote the k x k matrix 
0 0 0 —apo 
1 0 0 —ay 
0 1 0 ag 
0 0 +++ O ~ap-2 
O O --- LL ~ap-1 
where ao, @1,..-,@,—1 are arbitrary scalars. Prove that the character- 
istic polynomial of A is 
(—1)" (ag + ayt +--+ agit? * +0"). 


Hint: Use mathematical induction on k, expanding the determinant 
along the first row. 


Let T be a linear operator on a vector space V, and suppose that V is 
a T-cyclic subspace of itself. Prove that if U is a linear operator on V, 
then UT = TU if and only if U = g(T) for some polynomial g(t). Hint: 
Suppose that V is generated by v. Choose g(t) according to Exercise 13 
so that g(T)(v) = U(v). 


Let T be a linear operator on a two-dimensional vector space V. Prove 
that either V is a T-cyclic subspace of itself or T = cl for some scalar c. 


Let T be a linear operator on a two-dimensional vector space V and 
suppose that T 4 cl for any scalar c. Show that if U is any linear 
operator on V such that UT = TU, then U = g(T) for some polynomial 


g(t). 


Let T be a linear operator on a finite-dimensional vector space V, and 
let W be a T-invariant subspace of V. Suppose that v1, v2,...,Uz are 
eigenvectors of T corresponding to distinct eigenvalues. Prove that if 
vy tuo+t---+uz is in W, then v; € W for all 7. Hint: Use mathematical 
induction on k. 


Prove that the restriction of a diagonalizable linear operator T to any 
nontrivial T-invariant subspace is also diagonalizable. Hint: Use the 
result of Exercise 23. 
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25. (a) Prove the converse to Exercise 18(a) of Section 5.2: If T and U 
are diagonalizable linear operators on a finite-dimensional vector 
space V such that UT = TU, then T and U are simultaneously 
diagonalizable. (See the definitions in the exercises of Section 5.2.) 
Hint: For any eigenvalue » of T, show that E) is U-invariant, and 
apply Exercise 24 to obtain a basis for E) of eigenvectors of U. 

(b) State and prove a matrix version of (a). 


26. Let T bea linear operator on an n-dimensional vector space V such that 
T has n distinct eigenvalues. Prove that V is a T-cyclic subspace of itself. 
Hint: Use Exercise 23 to find a vector v such that {v, T(v),..., T’-1(v)} 
is linearly independent. 


Exercises 27 through 32 require familiarity with quotient spaces as defined 
in Exercise 31 of Section 1.3. Before attempting these exercises, the reader 
should first review the other exercises treating quotient spaces: Exercise 35 
of Section 1.6, Exercise 40 of Section 2.1, and Exercise 24 of Section 2.4. 


For the purposes of Exercises 27 through 32, T is a fixed linear operator on 
a finite-dimensional vector space V, and W is a nonzero T-invariant subspace 
of V. We require the following definition. 


Definition. Let T be a linear operator on a vector space V, and let W 
be a T-invariant subspace of V. Define T: V/W — V/W by 


T(u+W) =T(v)+W for anyu +WeE V/W. 


27. (a) Prove that T is well defined. That is, show that T(v + W) = 
T(v’ +W) whenever v+W=v'+W. 
(b) Prove that T is a linear operator on V/W. 
(c) Let 7: V — V/W be the linear transformation defined in Exer- 
cise 40 of Section 2.1 by n(v) = v + W. Show that the diagram of 
Figure 5.6 commutes; that is, prove that 7T = Tn. (This exercise 
does not require the assumption that V is finite-dimensional.) 


Af ag, 


n| |r 
v/w —> v/w 
Figure 5.6 


28. Let f(t), g(t), and A(t) be the characteristic polynomials of T, Tw, 
and T, respectively. Prove that f(t) = g(t)h(t). Hint: Extend an 
ordered basis y = {v1,V2,...,U~} for W to an ordered basis @ = 
{v1,2,---, Uk, Uk+1;---,Un} for V. Then show that the collection of 
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cosets @ = {vpi1 + W, vpra + W,...,Un + W} is an ordered basis for 


V/W, and prove that 
af Ba ie 
mea a) 


where B, = [T], and B3 = [T]a. 


29. Use the hint in Exercise 28 to prove that if T is diagonalizable, then so 
is T. 


30. Prove that if both Tw and T are diagonalizable and have no common 
eigenvalues, then T is diagonalizable. 


The results of Theorem 5.22 and Exercise 28 are useful in devising methods 
for computing characteristic polynomials without the use of determinants. 
This is illustrated in the next exercise. 


1 1 -3 
31. Let A= |2 3 4], let T = Ly, and let W be the cyclic subspace 
1 2 1 


of R® generated by ey. 


(a) Use Theorem 5.22 to compute the characteristic polynomial of Tw. 

(b) Show that {e2 + W} is a basis for R?/W, and use this fact to 
compute the characteristic polynomial of T. 

(c) Use the results of (a) and (b) to find the characteristic polynomial 
of A. 


32. Prove the converse to Exercise 9(a) of Section 5.2: If the characteristic 
polynomial of T splits, then there is an ordered basis @ for V such 
that [T]g is an upper triangular matrix. Hints: Apply mathematical 
induction to dim(V). First prove that T has an eigenvector v, let W = 


span({v}), and apply the induction hypothesis to T: V/W — V/W. 
Exercise 35(b) of Section 1.6 is helpful here. 


Exercises 33 through 40 are concerned with direct sums. 


33. Let T be a linear operator on a vector space V, and let W 1, W2,..., Wz. 
be T-invariant subspaces of V. Prove that W, + W2 +--- + W, is also 
a T-invariant subspace of V. 


34. Give a direct proof of Theorem 5.25 for the case k = 2. (This result is 
used in the proof of Theorem 5.24.) 


35. Prove Theorem 5.25. Hint: Begin with Exercise 34 and extend it using 
mathematical induction on k, the number of subspaces. 
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36. 


37. 


38. 


39. 


40. 


41. 


42. 


Let T be a linear operator on a finite-dimensional vector space V. 
Prove that T is diagonalizable if and only if V is the direct sum of 
one-dimensional T-invariant subspaces. 


Let T be a linear operator on a finite-dimensional vector space V, 
and let W;,W2,...,W, be T-invariant subspaces of V such that V = 
Wi @We2 @--:@We. Prove that 


det(T) = det(Tw, ) det(Tw,) --- det(Tw, ). 


Let T be a linear operator on a finite-dimensional vector space V, 
and let W,,W2,...,W, be T-invariant subspaces of V such that V = 
Wi @W2 @---@Ws,. Prove that T is diagonalizable if and only if Tw, 
is diagonalizable for all 7. 


Let C be a collection of diagonalizable linear operators on a finite- 
dimensional vector space V. Prove that there is an ordered basis (3 
such that [T]g is a diagonal matrix for all T € C if and only if the 
operators of C commute under composition. (This is an extension of 
Exercise 25.) Hints for the case that the operators commute: The result 
is trivial if each operator has only one eigenvalue. Otherwise, establish 
the general result by mathematical induction on dim(V), using the fact 
that V is the direct sum of the eigenspaces of some operator in C that 
has more than one eigenvalue. 


Let B,, Bo,..., By be square matrices with entries in the same field, and 
let A= B, @ By ®--- @ By. Prove that the characteristic polynomial 
of A is the product of the characteristic polynomials of the B,’s. 


Let 
1 2 n 
n+l1 n+2 2n 
A= ; . 
n2—-n+1 n?—-n+2 n? 


Find the characteristic polynomial of A. Hint: First prove that A has 
rank 2 and that span({(1,1,...,1),(1,2,...,”)}) is L4-invariant. 


Let A € Mnxn(R) be the matrix defined by A;; = 1 for all z and j. 
Find the characteristic polynomial of A. 
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Inner Product Spaces 


6.1 Inner Products and Norms 

6.2. The Gram-Schmidt Orthogonalization Process and Orthogonal 
Complements 

6.3. The Adjoint of a Linear Operator 

6.4 Normal and Self-Adjoint Operators 

6.5 Unitary and Orthogonal Operators and Their Matrices 

6.6 Orthogonal Projections and the Spectral Theorem 

6.7* The Singular Value Decomposition and the Pseudoinverse 

6.8* Bilinear and Quadratic Forms 

6.9* Einstein's Special Theory of Relativity 

6.10* Conditioning and the Rayleigh Quotient 

6.11* The Geometry of Orthogonal Operators 


Msi applications of mathematics are involved with the concept of mea- 
surement and hence of the magnitude or relative size of various quantities. So 
it is not surprising that the fields of real and complex numbers, which have a 
built-in notion of distance, should play a special role. Except for Section 6.8, 
we assume that all vector spaces are over the field fF’, where F' denotes either 
R or C. (See Appendix D for properties of complex numbers.) 

We introduce the idea of distance or length into vector spaces via a much 
richer structure, the so-called inner product space structure. This added 
structure provides applications to geometry (Sections 6.5 and 6.11), physics 
(Section 6.9), conditioning in systems of linear equations (Section 6.10), least 
squares (Section 6.3), and quadratic forms (Section 6.8). 


6.1 INNER PRODUCTS AND NORMS 


Many geometric notions such as angle, length, and perpendicularity in R? 
and R® may be extended to more general real and complex vector spaces. All 
of these ideas are related to the concept of inner product. 


Definition. Let V be a vector space over F. An inner product on V 
is a function that assigns, to every ordered pair of vectors x and y in V, a 
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scalar in F', denoted (x,y), such that for all x, y, and z in V and all c in F, 
the following hold: 


(a) (w+ z,y) = (@,y) + (2,9). 


(b) (cx, y) = (x,y). 
(c) (a, y) = (y, x), where the bar denotes complex conjugation. 
(d) (a,x) > 0 ifa 4 0. 


Note that (c) reduces to (x,y) = (y,x) if F = R. Conditions (a) and (b) 
simply require that the inner product be linear in the first component. 
It is easily shown that if aj, a@2,...,@n € F and y, v1, v2,... ,Un € V, then 


o a) = So ai (U;,Y) - 
i=1 i=1 


Example 1 
For « = (@1,42,...,@,) and y = (by, b2,...,b,) in F”, define 


The verification that (-,-) satisfies conditions (a) through (d) is easy. For 


example, if z = (c1,C2,..-,Cn), we have for (a) 
(t+ 2,y) = IG + ¢%)bi = ss aij + Sibi 
i=l i=1 i=1 
= (x,y) + (z,y)- 


Thus, for z = (1+i,4) and y = (2 — 31,4 + 5#) in C’, 


(x,y) = (1+ i)(24+3i)+4(4-5i)=15-154. 


The inner product in Example 1 is called the standard inner product 
on F”. When F' = R the conjugations are not needed, and in early courses 
this standard inner product is usually called the dot product and is denoted 
by x-y instead of (a, y). 


Example 2 


If (x,y) is any inner product on a vector space V and r > 0, we may define 
another inner product by the rule (a, y)’ =r (a,y). If r <0, then (d) would 
not hold. 
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Example 3 


Let V = C([0,1]), the vector age of ie valued continuous functions on 
(0, 1]. For f,g € V, define (f,g) =i t) dt. Since the preceding integral 
is linear in f, (a) and (b) are Neel as (c) is trivial. If f 4 0, then f? 
is bounded away from zero on some subinterval of [0,1] (continuity is used 
here), and hence (f, f) =e \Pdt>0. 


Definition. Let A © Mnxn(F). We define the conjugate transpose 
or adjoint of A to be the n x m matrix A* such that (A*),;; = Aj; for all i,j. 


Example 4 
Let 


Then 


Notice that if z and y are viewed as column vectors in F”, then (2, y) = 
ya. 

The conjugate transpose of a matrix plays a very important role in the 
remainder of this chapter. In the case that A has real entries, A* is simply 
the transpose of A. 


Example 5 
Let V = Mnxn(F), and define (A, B) = tr(B* A) for A,B € V. (Recall that 
the trace of a matrix A is defined by tr(A) = >}, Aji.) We verify that 
(a) and (d) of the definition of inner product hold and leave (b) and (c) to 
the reader. For this purpose, let A,B,C € V. Then (using Exercise 6 of 
Section 1.3) 


(A+ B,C) = tr(C*(A+ B)) = tr(C*A+ C*B) 
= tr(C*A) + tr(C*B) = (A,C) + (B,C). 


Also 


(A, A) = tr(A* A) = DA Ai _ 3 SA en 


n n n 


=S0 >) Ani Ani = 55 55 |Anal?. 
i=l k 


= i=1 k=1 


Now if A# O, then A,; #0 for some k andi. So (A, A) >0. 
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The inner product on M,y,(F') in Example 5 is called the Frobenius 
inner product. 

A vector space V over F' endowed with a specific inner product is called 
an inner product space. If F = C, we call V a complex inner product 
space, whereas if F = R, we call V areal inner product space. 

It is clear that if V has an inner product (a,y) and W is a subspace of 
V, then W is also an inner product space when the same function (x, y) is 
restricted to the vectors x,y € W. 

Thus Examples 1, 3, and 5 also provide examples of inner product spaces. 
For the remainder of this chapter, F” denotes the inner product space with 
the standard inner product as defined in Example 1. Likewise, Mnxn(F) 
denotes the inner product space with the Frobenius inner product as defined 
in Example 5. The reader is cautioned that two distinct inner products on 
a given vector space yield two distinct inner product spaces. For instance, it 
can be shown that both 


(f(2),9(#)), = | f(g(t)adt and (f(e),9(2))y = / FO g(t at 


are inner products on the vector space P(R). Even though the underlying 
vector space is the same, however, these two inner products yield two different 
inner product spaces. For example, the polynomials f(a) = 2 and g(x) = x? 
are orthogonal in the second inner product space, but not in the first. 

A very important inner product space that resembles C([0,1]) is the space 
H of continuous complex-valued functions defined on the interval [0,27] with 
the inner product 


27 


(f,9) a f(t)g(t) dt. 


~ On Jy 


The reason for the constant 1/27 will become evident later. This inner prod- 
uct space, which arises often in the context of physical situations, is examined 
more closely in later sections. 

At this point, we mention a few facts about integration of complex-valued 
functions. First, the imaginary number 7 can be treated as a constant under 
the integration sign. Second, every complex-valued function f may be written 
as f = fi +ifo, where f; and fo are real-valued functions. Thus we have 


[t-farifp and fr- fi 


From these properties, as well as the assumption of continuity, it follows 
that H is an inner product space (see Exercise 16(a)). 

Some properties that follow easily from the definition of an inner product 
are contained in the next theorem. 
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Theorem 6.1. Let V be an inner product space. Then for x,y,z € V and 
c € F, the following statements are true. 
(a) (ay +2) = (w,y) + (2,2). 
(x, cy) = C(x, y). 
(x, 0) = (0,2) =0. 
. ,©) = 0 if and only ifx = 0. 
If (x,y) = (a, z) for all x € V, then y = z. 


Proof. (a) We have 


(x,y +2) ae (y+ Z, 2) 7 (y, x) + (z, 2) 


The proofs of (b), (c), (d), and (e) are left as exercises. | 


The reader should observe that (a) and (b) of Theorem 6.1 show that the 
inner product is conjugate linear in the second component. 

In order to generalize the notion of length in R® to arbitrary inner product 
spaces, we need only observe that the length of x = (a,b,c) € R® is given by 
Va? +b? +c? = \/(z, 2). This leads to the following definition. 


Definition. Let V be an inner product space. For x € V, we define the 
norm or length of x by ||x|| = (x, 2). 


Example 6 
Let V = F”. If a = (a1, a2...,@n), then 


n 1/2 
|| || — ||(a1, a2 ..+,@n)|| = Sie 


i=l 


is the Euclidean definition of length. Note that if mn = 1, we have ||a|| = |al. 
¢ 


As we might expect, the well-known properties of Euclidean length in R? 
hold in general, as shown next. 


Theorem 6.2. Let V be an inner product space over F. Then for all 
z,y € V and cé€ F, the following statements are true. 
(a) lea] = [el- [la 
(b) ||z|| = 0 if and only if « = 0. In any case, ||x|| > 0. 
c) (Cauchy—Schwarz Inequality) | (x,y) | < |||] -|ly|I- 
d) (Triangle Inequality) ||x + y|| < ||x|| + |lyl]. 


334 Chap. 6 Inner Product Spaces 


Proof. We leave the proofs of (a) and (b) as exercises. 
(c) If y = 0, then the result is immediate. So assume that y 4 0. For any 
c € F, we have 


0 < lla — cyl|? = (a — cy, a — cy) = (2, a — cy) — ely, 2 — cy) 
= (x, x) — C(x, y) — c(y, x) + ce (y, y)- 


In particular, if we set 


ema 
(yy) 
the inequality becomes 
| (x,y) | | (x,y) | 
0 < (x, 2) = |||? - ae 
(y,¥) Ilyl| 


from which (c) follows. 
(d) We have 


lla + yll? =(@+y,e+y) = (x, 2) + (ya) + (x,y) + (YY) 
= |lal|? + 28 (x, y) + llyll? 
S lla]? + 2| (a, y) | + llyll? 
< |larl|? + 2Ilarl] -llgll + llyll? 


= ([lzl| + llyll)?, 
where ¥t (x, y) denotes the real part of the complex number (x,y). Note that 
we used (c) to prove (d). | 


The case when equality results in (c) and (d) is considered in Exercise 15. 


Example 7 


For F”, we may apply (c) and (d) of Theorem 6.2 to the standard inner 
product to obtain the following well-known inequalities: 


. 1/27 n 1/2 
Set Some 
i=l i=l 


IA 


n 
s aid; 
i=l 


and 
1/2 1/2 


_ 


IA 


Po in 1/2 
Sola + S16 
i=l 4=l 


n 
pS a? 
i=1 


Sec. 6.1. Inner Products and Norms 335 


The reader may recall from earlier courses that, for 2 and y in R? or R?, 
we have that (x,y) = ||x||-||y|| cos@, where 6 (0 < 6 < 7) denotes the angle 
between x and y. This equation implies (c) immediately since |cos6| < 1. 
Notice also that nonzero vectors x and y are perpendicular if and only if 
cos @ = 0, that is, if and only if (x,y) = 0. 

We are now at the point where we can generalize the notion of perpendic- 
ularity to arbitrary inner product spaces. 


Definitions. Let V be an inner product space. Vectors x and y in V are 
orthogonal (perpendicular) if (x,y) = 0. A subset S of V is orthogonal 
if any two distinct vectors in S are orthogonal. A vector x in V is a unit 
vector if ||x|| = 1. Finally, a subset S of V is orthonormal if S' is orthogonal 
and consists entirely of unit vectors. 


Note that if S = {v, vo,...}, then S' is orthonormal if and only if (v;, vj) = 
6;;, Where 6;; denotes the Kronecker delta. Also, observe that multiplying 
vectors by nonzero scalars does not affect their orthogonality and that if x is 
any nonzero vector, then (1/||2||)a is a unit vector. The process of multiplying 
a nonzero vector by the reciprocal of its length is called normalizing. 


Example 8 


In F?, {(1, 1,0), (1, -1,1), (—1,1,2)} is an orthogonal set of nonzero vectors, 
but it is not orthonormal; however, if we normalize the vectors in the set, we 
obtain the orthonormal set 


1 1 1 
{apt 0), at Tei ye bad}. ¢ 


Our next example is of an infinite orthonormal set that is important in 
analysis. This set is used in later examples in this chapter. 


Example 9 


Recall the inner product space H (defined on page 332). We introduce an im- 
portant orthonormal subset S of H. For what follows, 2 is the imaginary num- 
ber such that i? = —1. For any integer n, let f,(t) = e’™*, where 0 < t < 27. 
(Recall that e’"’ = cosnt +isinnt.) Now define S = {f,,: n is an integer}. 


Clearly S$ is a subset of H. Using the property that e’ = e~“ for every real 
number t, we have, for m # n, 


Qn 


1 20 1 a a ) 
my fn) = amt int dt = — i(m—n t dt 
Um fad = go fp emetaem = fe 


27 


1 
~ Qar(m — n)- 


i(m—n)t 


=0. 
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Also, 
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1 27 ( ) 1 27 
fn) =— iim—m)t p= — | 1dt=1. 
(fn: fn) ah e [| 


In other words, (fm, fn) =Omn- 


EXERCISES 


Label the following statements as true or false. 


(a) An inner product is a scalar-valued function on the set of ordered 
pairs of vectors. 

(b) An inner product space must be over the field of real or complex 
numbers. 

(c) An inner product is linear in both components. 

(d) There is exactly one inner product on the vector space R”. 

(e) The triangle inequality only holds in finite-dimensional inner prod- 
uct spaces. 

(f) Only square matrices have a conjugate-transpose. 

(g) If x, y, and z are vectors in an inner product space such that 
(x,y) = (x, z), then y = z. 

(h) If (x,y) = 0 for all x in an inner product space, then y = 0. 


Let x = (2,1+7%,7) and y = (2—i,2,1+ 27) be vectors in C?. Compute 
(x,y), |x|, [yl], and ||a + y||. Then verify both the Cauchy—Schwarz 
inequality and the triangle inequality. 


In C({0, 1]), let f(t) = t and g(t) = e’. Compute (f,g) (as defined in 
Example 3), ||f||, ||gl], and ||f + g||. Then verify both the Cauchy— 
Schwarz inequality and the triangle inequality. 


(a) Complete the proof in Example 5 that (-,-) is an inner product 
(the Frobenius inner product) on Mnxn(F). 
(b) Use the Frobenius inner product to compute || Al], || Bl], and (A, B) 


for 
A=(; . ga B= (7 Ze 
3 a a i 


In C?, show that (x,y) = xAy* is an inner product, where 


sa a 


Compute (x,y) for « = (1 —7,2+ 3%) and y = (2 +1,3 — 22). 
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6. Complete the proof of Theorem 6.1. 
7. Complete the proof of Theorem 6.2. 
8. Provide reasons why each of the following is not an inner product on 


the given vector spaces. 


(a) ((a,b), (c,d)) = ac — bd on R?. 

(b) (A,B) =tr(A + B) on Moy2(R). 

(c) (f(x), g(2)) = Te f' (t)g(£) dt on P(R), where ’ denotes differentia- 
tion. 


9. Let @ be a basis for a finite-dimensional inner product space. 
(a) Prove that if (x, z) = 0 for all z € 6, then x = 0. 
(b) Prove that if (x, z) = (y, z) for all z € @, then x = y. 


10.1 Let V be an inner product space, and suppose that 2 and y are orthog- 
onal vectors in V. Prove that ||a + y||? = ||2||? + |ly||?.. Deduce the 
Pythagorean theorem in R?. 


11. Prove the parallelogram law on an inner product space V; that is, show 
that 


llz + yll? + lz — yll? = 2llx||? + 2llyl|? for all 2,y € V. 
What does this equation state about parallelograms in R?? 
12.1 Let {v1,v2,...,v~} be an orthogonal set in V, and let a1, a2,...,ax% be 


scalars. Prove that 


z k 


= Si lail? lle. 


i=l 


k 
s AjVUi 
i=l 


13. Suppose that (-, +), and (-, +), are two inner products on a vector space 
V. Prove that (-,-) =(+,+), +(-+,°), is another inner product on V. 


14. Let A and B be n x n matrices, and let c be a scalar. Prove that 
(A+ cB)* = A* +7B*. 


15. (a) Prove that if V is an inner product space, then | (x,y) | = ||| -||y| 
if and only if one of the vectors x or y is a multiple of the other. 
Hint: If the identity holds and y ¥ 0, let 


(x,y) 


az 
IIyll? ° 
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16. 


17. 


18. 


19. 


20. 


21. 
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and let z = x — ay. Prove that y and z are orthogonal and 


|=" 
la| = _—.. 
Ilyl| 


Then apply Exercise 10 to ||x||? = ||ay + z||? to obtain ||z|] = 0. 
(b) Derive a similar result for the equality ||~ + y|| = ||z|| + |ly||, and 
generalize it to the case of n vectors. 


(a) Show that the vector space H with (-,-) defined on page 332 is an 
inner product space. 
(b) Let V = C([0,1]), and define 


1/2 


(f,9) = f(t)g(t) dt. 


0 


Is this an inner product on V? 


Let T be a linear operator on an inner product space V, and suppose 
that ||T(a)|| = ||2|| for all «. Prove that T is one-to-one. 


Let V be a vector space over F’', where F = R or F = C, and let W be 
an inner product space over F' with inner product (-,-). If T: V— W 
is linear, prove that (x,y)’ = (T(«), T(y)) defines an inner product on 
V if and only if T is one-to-one. 


Let V be an inner product space. Prove that 


(a) |la + yl? = |x|? £ 28 (x, y) + |ly||? for all z,y € V, where R (zx, y) 
denotes the real part of the complex number (sz, y). 
(b) | {lxI| — llyll| < lla — yl| for all x,y € V. 


Let V be an inner product space over F’. Prove the polar identities: For 
allaz,yEV, 


(a) (x,y) =4lle+yl?-glle-yl? if FP =R; 
(b) (2,y) =4+>4_, tlle t+ styl? if F = C, where i? = -1. 


Let A be an n X n matrix. Define 


2 2% 

(a) Prove that Af = A,, AS = Ao, and A = A; +iA_. Would it be 
reasonable to define A; and Ag to be the real and imaginary parts, 
respectively, of the matrix A? 

(b) Let A be an n x n matrix. Prove that the representation in (a) is 
unique. That is, prove that if A = B, +7B), where Bf = B, and 
BS = Bo, then By = A, and Bg = Ao. 
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22. Let V be a real or complex vector space (possibly infinite-dimensional), 
and let @ be a basis for V. For x,y € V there exist v1, v2,...,Un € B 
such that 


23. 


n n 
c= y ajv; and y= y bjv;- 
i=l i=l 


Define 


(a) 


(b) 


n 


(x,y) = S- aibi. 


i=1 


Prove that (-,-+) is an inner product on V and that @ is an or- 
thonormal basis for V. Thus every real or complex vector space 
may be regarded as an inner product space. 

Prove that if V = R” or V = C” and @ is the standard ordered 
basis, then the inner product defined above is the standard inner 
product. 


Let V = F”, and let A € May n(F). 


(a) 
(b) 


(c) 


(d) 


Prove that (2, Ay) = (A*z,y) for all z,y € V. 

Suppose that for some B € Mnxn(F), we have (x, Ay) = (Bx, y) 
for all x,y € V. Prove that B = A*. 

Let a be the standard ordered basis for V. For any orthonormal 
basis (9 for V, let Q be the n x n matrix whose columns are the 
vectors in 3. Prove that Q* = Qu}. 

Define linear operators T and U on V by T(#) = Az and U(x) = 
A*x. Show that [U]g = [T]% for any orthonormal basis for V. 


The following definition is used in Exercises 24-27. 


Definition. Let V be a vector space over F', where F is either R or 


C’. Regardless of whether V is or is not an inner product space, we may still 
define a norm ||-|| as a real-valued function on V satisfying the following three 
conditions for allx,y € V anda é F: 


(1) ||a|| > 0, and ||a|| = 0 if and only if « = 0. 
(2) |laal] = Ja|- lll. 
(3) lla + yll S Hall + llyll- 


24. Prove that the following are norms on the given vector spaces V. 


(a) 
(b) 


V=Mmxn(F);  ||Al| = max|A;,;| for all AG V 
ij 

V=C((0,1)); [fl = max |f(@)| for all fev 
te [0,1] 
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25. 


26. 


27. 
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1 


() V=C(l0.1); l= f ld forall fev 
0 
(a) V=R%  [\(a,d)|| = max{|al, |6)} for all (a,b) €V 
Use Exercise 20 to show that there is no inner product (-,+) on R? 


such that ||a||? = (a,x) for all « € R? if the norm is defined as in 
Exercise 24(d). 


Let ||-|| be a norm on a vector space V, and define, for each ordered pair 
of vectors, the scalar d(x, y) = || — y|l, called the distance between x 
and y. Prove the following results for all x,y,z € V. 


(a) d(x,y) = 

(b) ie Nea x). 

(c) d(z,y) < a ,2) + d(z,y). 
(d) d(x,x) = 

(e) cen e 


Let ||-|| be a norm on a real vector space V satisfying the parallelogram 
law given in Exercise 11. Define 


1 
(x,y) = J [lle + yll? - lle - yIl*] 
Prove that (-,-+) defines an inner product on V such that ||a||? = (x, ) 
for all « € V. 
Hints: 


(a) Prove (x,2y) = 2(a,y) for all z,y € V. 
(b) Prove (#+u,y) = (x,y) + (u,y) for all x, u,y € V. 


(c) Prove (nz,y) = n(a,y) for every positive integer nm and every 
x,yev. 

(d) Prove m(+2,y) = (x,y) for every positive integer m and every 
x,yev. 

(e) Prove (rz,y) = r(ax,y) for every rational number r and every 
x,yerv. 


(f) Prove | (x,y) | < |2||||y|| for every x,y € V. Hint: Condition (3) in 
the definition of norm can be helpful. 

(g) Prove that for every c € R, every rational number r, and every 
r,yeV, 


lc (x,y) — (ex, y) | = [(e-r) (#,y) — ((e—r)a,y) | < 2le—r|llall[lyll- 


(h) Use the fact that for any c € R, |c—7| can be made arbitrarily 
small, where r varies over the set of rational numbers, to establish 
item (b) of the definition of inner product. 
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28. Let V be a complex inner product space with an inner product (-, -). 
Let [-,-] be the real-valued function such that [x,y] is the real part of 
the complex number (x,y) for all x,y € V. Prove that [-, -] is an inner 
product for V, where V is regarded as a vector space over R. Prove, 
furthermore, that [z, ia] = 0 for all a € V. 


29. Let V be a vector space over C’, and suppose that [-,-] is a real inner 
product on V, where V is regarded as a vector space over R, such that 
[x,iz] = O for all e € V. Let (-,-) be the complex-valued function 
defined by 


(x,y) = (x,y) +%[z,t4y] for z,yeV. 
Prove that (+, +) is a complex inner product on V. 


30. Let ||-|| be a norm (as defined in Exercise 24) on a complex vector 
space V satisfying the parallelogram law given in Exercise 11. Prove 
that there is an inner product (-,-) on V such that ||z||? = (a,x) for 
all x € V. 


Hint: Apply Exercise 27 to V regarded as a vector space over R. Then 
apply Exercise 29. 


6.2 THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS 
AND ORTHOGONAL COMPLEMENTS 


In previous chapters, we have seen the special role of the standard ordered 
bases for C” and R”. The special properties of these bases stem from the fact 
that the basis vectors form an orthonormal set. Just as bases are the building 
blocks of vector spaces, bases that are also orthonormal sets are the building 
blocks of inner product spaces. We now name such bases. 


Definition. Let V be an inner product space. A subset of V is an 
orthonormal basis for V if it is an ordered basis that is orthonormal. 


Example 1 


The standard ordered basis for F” is an orthonormal basis for F”. Cf 


Example 2 
The set 


is an orthonormal basis for R?. 
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The next theorem and its corollaries illustrate why orthonormal sets and, 
in particular, orthonormal bases are so important. 


Theorem 6.3. Let V be an inner product space and S = {v1,v2,..., UK} 
be an orthogonal subset of V consisting of nonzero vectors. If y € span($), 
then 


k 
Proof. Write y = S- aivi, where a1, 4@2,...,a, € F. Then, forl <j <k, 
i=l 
we have 
k k 
(y, 03) = (s cans = 5 a: (i, vj) = ay (04,05) = ayllv,l|?. 
i=1 i=1 
So a; = (y, 03) and the result follows. | 


Ilegll? ’ 
The next corollary follows immediately from Theorem 6.3. 


Corollary 1. If, in addition to the hypotheses of Theorem 6.3, S' is 
orthonormal and y € span(S), then 


k 
y= oy, v1) %. 
i=l 


If V possesses a finite orthonormal basis, then Corollary 1 allows us to 
compute the coefficients in a linear combination very easily. (See Example 3.) 


Corollary 2. Let V be an inner product space, and let S be an orthogonal 
subset of V consisting of nonzero vectors. Then S is linearly independent. 


Proof. Suppose that v1,v2,...,vzn € S and 


k 
) ayVyi = 0. 
i=1 


As in the proof of Theorem 6.3 with y = 0, we have a; = (0,v;) /||v;||? =0 
for all 7. So S is linearly independent. | 
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Example 3 
By Corollary 2, the orthonormal set 


1 1 1 
{9}. a, 11), 7g! 1.1.2) 


obtained in Example 8 of Section 6.1 is an orthonormal basis for R°. Let 
x = (2,1,3). The coefficients given by Corollary 1 to Theorem 6.3 that 
express x as a linear combination of the basis vectors are 


1 3 1 4 
a, = —(2+1)=— 5, w= 2-143)= ‘ 
and 
1 5 
a3 = 2+1+6)= : 
As a check, we have 
3 4 5 
2,1,3) = =(1,1 + ~(1,-1,1)4 1,1, 2). 
(2, 1,3) 5 (1,1,0) 3 (1, ,1) ral , 1,2) ¢ 


Corollary 2 tells us that the vector space H in Section 6.1 contains an 
infinite linearly independent set, and hence H is not a finite-dimensional vector 
space. 

Of course, we have not yet shown that every finite-dimensional inner prod- 
uct space possesses an orthonormal basis. The next theorem takes us most 
of the way in obtaining this result. It tells us how to construct an orthogonal 
set from a linearly independent set of vectors in such a way that both sets 
generate the same subspace. 

Before stating this theorem, let us consider a simple case. Suppose that 
{w 1, w2} is a linearly independent subset of an inner product space (and 
hence a basis for some two-dimensional subspace). We want to construct 
an orthogonal set from {w1, w2} that spans the same subspace. Figure 6.1 
suggests that the set {v,, v2}, where vy = w, and vg = w2 — cw}, has this 
property if c is chosen so that v2 is orthogonal to W;. 

To find c, we need only solve the following equation: 


0 = (ve, w1) = (we — cw1, W1) = (we, wW1) — €(w1, 1). 


So 
se wey 
I|w1 || 
Thus 
(we, W1) 
v2 = W2 Wi. 
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v2 Wi = V1 


Figure 6.1 


The next theorem shows us that this process can be extended to any finite 
linearly independent subset. 


Theorem 6.4. Let V be an inner product space and S = {w, w2,...,Wn} 
be a linearly independent subset of V. Define S’ = {v1,v2,...,Un}, where 
v1 = w, and 


k-1 
(Wk, Uj) 
te = we De Ys for2<k<n. (1) 
j=l 


Then S’ is an orthogonal set of nonzero vectors such that span(S’) = span(). 


Proof. The proof is by mathematical induction on n, the number of vectors 
in S. For k = 1,2,...,n, let Sp = {wi,wo,...,we}. If n = 1, then the 
theorem is proved by taking S} = $1; ie., vu) = wi # 0. Assume then that the 
set Si), = {v1,v2,...,Ux—1} with the desired properties has been constructed 
by the repeated use of (1). We show that the set $= {vi,v2,... ,Uk—1, ue} 
also has the desired properties, where vz is obtained from S;,_, by (1). If up = 
0, then (1) implies that w, € span(S},_,) = span(S;,—1), which contradicts 
the assumption that $; is linearly independent. For 1 <i< k —1, it follows 
from (1) that 


k-— Lar 


(Wr U; Wk, Ui 
(Uk, Vi) = (Wk, Vi) -S> vee (uj, vi) = (We, Vi) os ae IIv4||? = 0, 
a 


j=l 


since (v;,u;) = Oifi #7 by the induction assumption that S},_, is orthogonal. 
Hence Sj, is an orthogonal set of nonzero vectors. Now, by (1), we have that 
span(.S},) C span(S;,). But by Corollary 2 to Theorem 6.3, 5S), is linearly 
independent; so dim(span(5S%,)) = dim(span(S;,)) = &. Therefore span(.S;,) = 
span(S;,). 


The construction of {v1,v2,...,Un} by the use of Theorem 6.4 is called 
the Gram—Schmidt process. 


Sec. 6.2. Gram-Schmidt Orthogonalization Process 345 


Example 4 

In R4, let wi = (1,0,1,0), we = (1,1,1,1), and w3 = (0,1,2,1). Then 
{w 1, We, w3} is linearly independent. We use the Gram—Schmidt process to 
compute the orthogonal vectors v,, v2, and v3, and then we normalize these 
vectors to obtain an orthonormal set. 


Take v; = w; = (1,0,1,0). Then 


2 
= (1, 1, 1, 1) 5 (1,0, 1,0) 


= (0, 1,0, 1). 
Finally, 
= (w3, V1) (w3, v2) 
8 Tele ale? 


2 2 
= (0, 1,2, 1) ae 5 (1,0, 1,0) a 59,1, 1) 


= (—1,0,1,0). 


These vectors can be normalized to obtain the orthonormal basis {uz, u2, us}, 
where 


1 
ate “ , 0, 1,0), 
ye as (0,1,0, 1), 
Jel =i 
and 
int A 04 Gy 
Ilvs|| v2 
Example 5 
Let V = P(R) with the inner product (f(x), g =f) f(t)g(t) dt, and 


consider the subspace P2(R) with the standard me baste B. oe use the 
Gram-Schmidt process to replace @ by an orthogonal basis {v1, v2, v3} for 
P2(R), and then use this orthogonal basis to obtain an orthonormal basis for 
Po(R). 


1 1 
Take vy = 1. Then |}v||? =f 1? dt = 2, and (2,11) =] feldeal 


=e -1 
Thus 
(v1, 2) 0 


Vg = 2 =z =. 
Ilex ||? 2 
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Furthermore, 
1 


1 
(?,n1) = f P-1dt= 2 and (2,00) = f t? .tdt =0. 


-1 =i 


Therefore 


2 2 
2_ (tr) (2*, v2) 
UV. x U U 
: leu?" [eal 
1 
2 
=9?—_+1-0: 
xv 3 x 
ays 
3 


We conclude that {1,2,2? — $} is an orthogonal basis for P2(R). 
To obtain an orthonormal basis, we normalize v;, v2, and v3 to obtain 
1 =e 
fia v2 


U= 


x 3 
U2 = a Jee 
vied ve 


and similarly, 


U3 i (x4), 


Ilvs| 
Thus {u1, uz, v3} is the desired orthonormal basis for Po(R). 


If we continue applying the Gram—Schmidt orthogonalization process to 
the basis {1, 2, x?,...} for P(R), we obtain an orthogonal basis whose elements 
are called the Legendre polynomials. The orthogonal polynomials v,, v2, and 
v3 in Example 5 are the first three Legendre polynomials. 

The following result gives us a simple method of representing a vector as 
a linear combination of the vectors in an orthonormal basis. 


Theorem 6.5. Let V be a nonzero finite-dimensional inner product space. 
Then V has an orthonormal basis 3. Furthermore, if 3 = {v,,v2,...,Un} and 
xz €V, then 
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Proof. Let Go be an ordered basis for V. Apply Theorem 6.4 to obtain 
an orthogonal set 3’ of nonzero vectors with span(@’) = span(Go) = V. By 
normalizing each vector in 6’, we obtain an orthonormal set 3 that generates 
V. By Corollary 2 to Theorem 6.3, @ is linearly independent; therefore 3 
is an orthonormal basis for V. The remainder of the theorem follows from 
Corollary 1 to Theorem 6.3. | 


Example 6 


We use Theorem 6.5 to represent the polynomial f(x) = 1+ 2x + 32? as 
a linear combination of the vectors in the orthonormal basis {w1, uz, u3} for 
P2(R) obtained in Example 5. Observe that 


(f(x), u1) = i: “5 + 2t + 3t7) dt = 2/2, 


(f(x), uz) = a [era + 2t + 3t?) dt = ae 


and 


(f(x), us) =f Roe 1)(1 + 2¢ + 3¢?) dt = a 


2V6 2/10 
3 


Therefore f(r) = 2V2 u14 24 uz. 

Theorem 6.5 gives us a simple method for computing the entries of the 
matrix representation of a linear operator with respect to an orthonormal 
basis. 


Corollary. Let V be a finite-dimensional inner product space with an 
orthonormal basis 3 = {v1,V2,...,Un}. Let T be a linear operator on V, and 
let A= [T]g. Then for any i and j, Aj; = (T(v;), 4). 


Proof. From Theorem 6.5, we have 
T(v;) = S> (T(0,), 04) v:- 
i=l 
Hence Aj = (T(v;), vi). | 


The scalars (x, v;) given in Theorem 6.5 have been studied extensively 
for special inner product spaces. Although the vectors v1,v2,...,Un were 
chosen from an orthonormal basis, we introduce a terminology associated 
with orthonormal sets 2 in more general inner product spaces. 
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Definition. Let @ be an orthonormal subset (possibly infinite) of an 
inner product space V, and let x € V. We define the Fourier coefficients 
of x relative to 3 to be the scalars (x,y), where y € (3. 


In the first half of the 19th century, the French mathematician Jean Bap- 
tiste Fourier was associated with the study of the scalars 


2a 27 
f()sinntdt and f(t) cos nt dt, 
0 0 
or more generally, 
ae (the dt 
oi =p e€ ; 


for a function f. In the context of Example 9 of Section 6.1, we see that 
en = (f, fn), where f,(t) =e’; that is, c, is the nth Fourier coefficient for a 
continuous function f € V relative to S. These coefficients are the “classical” 
Fourier coefficients of a function, and the literature concerning the behavior of 
these coefficients is extensive. We learn more about these Fourier coefficients 
in the remainder of this chapter. 


Example 7 


Let S$ = {e’”*: n is an integer}. In Example 9 of Section 6.1, S was shown to 
be an orthonormal set in H. We compute the Fourier coefficients of f(t) = t 
relative to S. Using integration by parts, we have, for n # 0, 


1 27 27 —1 
Ufa) = 5 [teat = ma te dt = —, 
27 Jo a 


n 


and, for n = 0, 


(f,1) = xf taya=n, 


As a result of these computations, and using Exercise 16 of this section, we 
obtain an upper bound for the sum of a special infinite series as follows: 


fl? > Sar | (f, fa) ? +1 (f,1) P+ DH (f, fn) | 


n=—-k 


Bc. ee eed 
Ss a ie 
n=1 
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4 
for every k. Now, using the fact that || ||? = 37 we obtain 


or 
2 k 
T 
roe 
Because this inequality holds for all k, we may let k — oo to obtain 
2 


gol 


Additional results may be produced by replacing f by other functions. 


3 
Il 
an 


We are now ready to proceed with the concept of an orthogonal comple- 
ment. 


Definition. Let .S be a nonempty subset of an inner product space V. We 
define S+ (read “S perp”) to be the set of all vectors in V that are orthogonal 
to every vector in S; that is, St ={x EV: (z,y) =0 for all y € S}. The set 
S+ is called the orthogonal complement of S. 


It is easily seen that S+ is a subspace of V for any subset 5 of V. 


Example 8 


The reader should verify that {0}+ = V and V+ = {0} for any inner product 
space V. 


Example 9 
If V = R® and S = {es}, then S+ equals the ry-plane (see Exercise 5). @ 


Exercise 18 provides an interesting example of an orthogonal complement 
in an infinite-dimensional inner product space. 

Consider the problem in R® of finding the distance from a point P to a 
plane W. (See Figure 6.2.) Problems of this type arise in many settings. If 
we let y be the vector determined by 0 and P, we may restate the problem 
as follows: Determine the vector u in W that is “closest” to y. The desired 
distance is clearly given by ||y — u||. Notice from the figure that the vector 
z= y—u is orthogonal to every vector in W, and so z € Wt. 

The next result presents a practical method of finding wu in the case that 
W is a finite-dimensional subspace of an inner product space. 
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z=y-Uu 


Figure 6.2 


Theorem 6.6. Let W be a finite-dimensional subspace of an inner product 
space V, and let y € V. Then there exist unique vectors u € W and z € W+ 
such that y= u+ 2. Furthermore, if {v1,v2,...,U,} is an orthonormal basis 
for W, then 


k 
us Sy (Y, Ui) Ui- 


i=1 
Proof. Let {v1,v2,...,v~} be an orthonormal basis for W, let u be as 
defined in the preceding equation, and let z = y—u. Clearly u € W and 


yr=urtZ. 
To show that z € W-, it suffices to show, by Exercise 7, that z is orthog- 
onal to each v;. For any j, we have 


k k 
(2,07) = ((v- S- (y, vi) s] oy) = (y, U9) Pad Y, Vi) (Vi, V5) 


i=1 
= (y, Uj) a (y, Uj) = 0. 
To show uniqueness of u and z, suppose that y= u+z =u’ + 2’, where 


u’ € W and 2 € Wt. Then u—w' = 2’-— ze WoW! = {0}. Therefore, 
u=u' and z= 2’. 


Corollary. In the notation of Theorem 6.6, the vector u is the unique 
vector in W that is “closest” to y; that is, for any x € W, |ly—<x|| > ||y— ull, 
and this inequality is an equality if and only if x = u. 


Proof. As in Theorem 6.6, we have that y = u+ z, where z € Wt. Let 
xz €W. Then u — z is orthogonal to z, so, by Exercise 10 of Section 6.1, we 
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have 
lly — 2||? = |lu+ 2 — all? = [|(w— 2) + 2||? = lu — al? + [lel|? 
> [zl]? = lly — ull. 
Now suppose that ||y— || = ||y — ul]. Then the inequality above becomes an 


equality, and therefore ||u — 2||? + ||z||? = ||z||?. It follows that ||u — 2|| = 0, 
and hence x = u. The proof of the converse is obvious. 


The vector wu in the corollary is called the orthogonal projection of y 
on W. We will see the importance of orthogonal projections of vectors in the 
application to least squares in Section 6.3. 


Example 10 
Let V = P3(R) with the inner product 


(f(a), g(a) = ik F)g{t)at forall f(x). gle) €V. 


We compute the orthogonal projection fi(x) of f(x) = a2? on P2(R). 
By Example 5, 


{u1,u2, us} = {JeV a7 g0e v} 


is an orthonormal basis for P2(R). For these vectors, we have 


(ilaun)= [ Pdt= 0, (Fle),u2) = [ey hea a 


and 


Hence 
file) = (f(a),11) m1 + (/(@), 22) 2 + (F(@)sus) us = Se. @ 


It was shown (Corollary 2 to the replacement theorem, p. 47) that any lin- 
early independent set in a finite-dimensional vector space can be extended to 
a basis. The next theorem provides an interesting analog for an orthonormal 
subset of a finite-dimensional inner product space. 


352 Chap. 6 Inner Product Spaces 


Theorem 6.7. Suppose that S = {v1,v2,...,U,} is an orthonormal set 
in an n-dimensional inner product space V. Then 
(a) S can be extended to an orthonormal basis {v1,v2,... ,Uk,Uk-+15+++ ; Un} 
for V. 


(b) If W = span(S), then Sy = {vp41,Up+2,---;Un} is an orthonormal 
basis for W+ (using the preceding notation). 
(c) If W is any subspace of V, then dim(V) = dim(W) + dim(W+). 


Proof. (a) By Corollary 2 to the replacement theorem (p. 47), S can be 
extended to an ordered basis S” = {v1,V2,..-, UK, Wk+1;---,;Wn} for V. Now 
apply the Gram—Schmidt process to S’. The first k vectors resulting from 
this process are the vectors in S by Exercise 8, and this new set spans V. 
Normalizing the last n — k vectors of this set produces an orthonormal set 
that spans V. The result now follows. 

(b) Because S; is a subset of a basis, it is linearly independent. Since $1 
is clearly a subset of W+, we need only show that it spans W+. Note that, 
for any x € V, we have 


n 


r= S- (X, Vi) Vj. 


i=1 
If ¢ € Wt, then (x, v;) = 0 for 1 <i <k. Therefore, 


n 


C= S- (x, Uj) U; € span(S}). 
i=k4+1 


(c) Let W be a subspace of V. It is a finite-dimensional inner product 
space because V is, and so it has an orthonormal basis {v1,v2,...,vz}. By 
(a) and (b), we have 


dim(V) =n =k +(n—k) = dim(W) + dim(W*). | 


Example 11 

Let W = span({e1,e2}) in F?. Then x = (a,b,c) € W¢ if and only if 0 = 
(z,e1) = a and 0 = (z,e2) = b. So x = (0,0,c), and therefore Wt = 
span({e3}). One can deduce the same result by noting that e3 € Wt and, 
from (c), that dim(W+)=3-2=1. 


EXERCISES 


1. Label the following statements as true or false. 


(a) The Gram-Schmidt orthogonalization process allows us to con- 
struct an orthonormal set from an arbitrary set of vectors. 
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(b) 


(c) 
(d) 


(e) 
(f) 
(g) 


Every nonzero finite-dimensional inner product space has an or- 
thonormal basis. 

The orthogonal complement of any set is a subspace. 

If {v1,V2,.--,Un} is a basis for an inner product space V, then for 
any x € V the scalars (x, v;) are the Fourier coefficients of «x. 

An orthonormal basis must be an ordered basis. 

Every orthogonal set is linearly independent. 

Every orthonormal set is linearly independent. 


2. In each part, apply the Gram—Schmidt process to the given subset S' of 
the inner product space V to obtain an orthogonal basis for span(S). 
Then normalize the vectors in this basis to obtain an orthonormal basis 
@ for span($'), and compute the Fourier coefficients of the given vector 
relative to @. Finally, use Theorem 6.5 to verify your result. 


(a) 
(b) 
(c) 


(d) 
(e) 
(f) 


(g) 


(h) 


(i) 


(i) 


(k) 


V =R3, § = {(1,0,1), (0,1, 1), (1,3,3)}, and x = (1, 
V =R3, § = {(1,1,1), (0,1, 1), (0,0, 1)}, and x = (1,0, 
V = P2(R) with the inner product (f(x), 9(x)) = fo f(t)g(t) dt, 
S= {1,az,27}, and A(z) =1+2 

V =span(S), where S = {(1,7,0), (1 — 7, 2,47)}, and 

x = (3+4,4i, —4) 

V=R‘, S = {(2,—1,—2,4), (—2,1,—5,5),(—1,3,7,11)}, and = 
(—11,8, —4, 18) 

V =R4, S = {(1, —2,—1,3), (3,6, 3, —1), (1,4,2,8)}, 


Femitn s~{(3 C4 2G 2pm 
7 
wt 8 {C2 )(2 8) 2B) mea 


Vv 
V = span(S) with the inner product (f,g) = B f(g (t) dt, 
S 
Vv 
( 
V 
( 


= {sint,cost,1,t}, and h(t) = 2t+1 

=C4, §={(1,1,2—4,-1), (2+ 34, 34,1 — 4,24), 

1+77, 6+10i, 11—47,3+47)}, and « = (—2+7i,6+92, 9-37, 44-47) 
= C4, S = {(—4,3 — 2i,i,1 — 42), 


1—5i, 5—4é, -3+5i, 7—2i), (27-1, -7—6i, —15+25%, -7—6:)}, 
and x = (—13 — 7i, -12 + 3i, —39 — 114, —26 + 5#) 
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Pe: = 3; 8i 4 
(1) V= Maal). 8={ (33 4+i Wes a ge) 
25-38 —2-13)\] , 4 (-2+8) —13+4 
1 7er Se a (0 ONO 10k Gs 
1+i i 1-7i -9-—8i 
(m) V= Maal) 8=4(5'%, me newer =) 


11— 132i —34—31/\] |, y_ (—7+5i 3+ 18 
T= =TPSse ele OO AO er -- 2 te 


3. In R ; let 
V2 : V2 : V2 : V2 


Find the Fourier coefficients of (3,4) relative to (. 


4. Let S = {(1,0,%),(1,2,1)} in C?. Compute $+. 


5. Let So = {xo}, where xo is a nonzero vector in R*. Describe Sg ge- 
ometrically. Now suppose that S = {x,,x2} is a linearly independent 
subset of R?. Describe $+ geometrically. 


6. Let V be an inner product space, and let W be a finite-dimensional 
subspace of V. If « ¢ W, prove that there exists y € V such that 
y © Wt, but (x,y) 40. Hint: Use Theorem 6.6. 


7. Let @ be a basis for a subspace W of an inner product space V, and let 
z€V. Prove that z € Wt if and only if (z,v) = 0 for every v € 8. 


8. Prove that if {w ,we,...,Wn} is an orthogonal set of nonzero vectors, 
then the vectors v1, v2,...,Un derived from the Gram—Schmidt process 
satisfy vu; = w; fori = 1,2,...,n. Hint: Use mathematical induction. 


9. Let W = span({(i,0,1)}) in C®. Find orthonormal bases for W and W+. 


10. Let W be a finite-dimensional subspace of an inner product space V. 
Prove that there exists a projection T on W along W+ that satisfies 
N(T) = W+. In addition, prove that ||T(a)|| < ||a|| for all x € V. 
Hint: Use Theorem 6.6 and Exercise 10 of Section 6.1. (Projections are 
defined in the exercises of Section 2.1.) 


11. Let A be an n x n matrix with complex entries. Prove that AA* = I if 
and only if the rows of A form an orthonormal basis for C”. 


12. Prove that for any matrix A € Minxn(F), (R(La*))> = N(La). 
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13. 


14. 


15. 


16. 


17. 


18. 


Let V be an inner product space, S and Sg be subsets of V, and W be 
a finite-dimensional subspace of V. Prove the following results. 

(a) So CS implies that S+ C St. 

(b) S.C (S4)+; so span($) C (S4)+. 

(c) W=(W+)+. Hint: Use Exercise 6. 

(d) V=WeWL-. (See the exercises of Section 1.3.) 


Let W, and W2 be subspaces of a finite-dimensional inner product space. 
Prove that (W1-+W2)+ = WiNW¢ and (Wi NW2)t = Wi+W3. (See 
the definition of the sum of subsets of a vector space on page 22.) Hint 
for the second equation: Apply Exercise 13(c) to the first equation. 


Let V be a finite-dimensional inner product space over F’. 


(a) Parseval’s Identity. Let {v1,v2,...,Un} be an orthonormal basis 
for V. For any x,y € V prove that 


n 


(x,y) = Ly (x, vi) (Y, vi). 


i=1 


(b) Use (a) to prove that if @ is an orthonormal basis for V with inner 
product (-,+), then for any x,y € V 


(ba(x), da(y))’ = ([2]a, lula)’ = (x,y), 
where (-, +)’ is the standard inner product on F”. 


(a) Bessel’s Inequality. Let V be an inner product space, and let S = 
{v1, v2,...,Un} be an orthonormal subset of V. Prove that for any 
xz € V we have 


n 
lll? => So te, 24) P. 
i=l 


Hint: Apply Theorem 6.6 to z € V and W = span($). Then use 
Exercise 10 of Section 6.1. 

(b) In the context of (a), prove that Bessel’s inequality is an equality 
if and only if x € span(S). 


Let T be a linear operator on an inner product space V. If (T(x), y) = 0 
for all x,y € V, prove that T = To. In fact, prove this result if the 
equality holds for all « and y in some basis for V. 


Let V = C([-1, 1]). Suppose that W, and W, denote the subspaces of V 
consisting of the even and odd functions, respectively. (See Exercise 22 
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19. 


20. 


21. 


22. 


23. 
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of Section 1.3.) Prove that W+ = W,, where the inner product on V is 


defined by 
1 
- / _f(Og(t at 


In each of the following parts, find the orthogonal projection of the 
given vector on the given subspace W of the inner product space V. 
(a) V=R?, u= (2,6), and W= {(z,y): y =4z}. 
(b) V=R°, u= (2,1,3), and W= {(z,y,2 ae see 
(c) V = P(R) with the inner product (f(z =r 

h(x) = 4+ 32 — 227, and W= ae 


t, 


In each part of Exercise 19, find the distance from the given vector to 
the subspace W. 


Let V = C({[—1,1]) with the inner product (f, g) =f, f(t)g(t) dt, and 
let W be the subspace P2(R), viewed as a space af a Use 
the orthonormal basis obtained in Example 5 to compute the “best” 
(closest) second-degree polynomial approximation of the function h(t) = 
e’ on the interval [—1, 1]. 


Let V = C((0,1]) with the inner product (f, 9) ae t) dt. Let W 
be the subspace spanned by the linearly independent o ae vi}. 


(a) Find an orthonormal basis for W. 
(b) Let h(t) = t?. Use the orthonormal basis obtained in (a) to obtain 
the “best” (closest) approximation of h in W. 


Let V be the vector space defined in Example 5 of Section 1.2, the 
space of all sequences o in F' (where F = R or F = C) such that 
a(n) # 0 for only finitely many positive integers n. For o, u € V, we 


define (a, 4) = ys a(n)u(n). Since all but a finite number of terms of 
n=1 
the series are zero, the series converges. 
(a) Prove that (-,+) is an inner product on V, and hence V is an inner 
product space. 
(b) For each positive integer n, let e, be the sequence defined by 
en(k) = On,x, where 6n,, is the Kronecker delta. Prove that 
{e1, €2,...} is an orthonormal basis for V. 
(c) Let o, = e1 +e, and W = span({o,: n > 2}. 
(i) Prove that e, € W, so W FV. 
(ii) Prove that W+ = {0}, and conclude that W 4 (W+)+. 
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Thus the assumption in Exercise 13(c) that W is finite-dimensional 
is essential. 


6.3 THE ADJOINT OF A LINEAR OPERATOR 


In Section 6.1, we defined the conjugate transpose A* of a matrix A. For 
a linear operator T on an inner product space V, we now define a related 
linear operator on V called the adjoint of T, whose matrix representation 
with respect to any orthonormal basis ( for V is [T]%. The analogy between 
conjugation of complex numbers and adjoints of linear operators will become 
apparent. We first need a preliminary result. 

Let V be an inner product space, and let y € V. The function g: V — F 
defined by g(x) = (x,y) is clearly linear. More interesting is the fact that if 
V is finite-dimensional, every linear transformation from V into F is of this 
form. 


Theorem 6.8. Let V be a finite-dimensional inner product space over F’,, 
and let g: V — F be a linear transformation. Then there exists a unique 
vector y € V such that g(a) = (x,y) for alla € V. 


Proof. Let 3 = {v1, v2,...,Un} be an orthonormal basis for V, and let 


y= > glviui. 
t=1 


Define h: V > F' by h(a) = (x,y), which is clearly linear. Furthermore, for 
1< 7 <n we have 


h(v;) = (vj,Y) -_ (1.35 ) a ~ (vi) (v5, Vi) 


Since g and h both agree on (3, we have that g = h by the corollary to 
Theorem 2.6 (p. 73). 

To show that y is unique, suppose that g(a#) = (a,y’) for all x. Then 
(x,y) = (x,y’) for all x; so by Theorem 6.1(e) (p. 333), we have y=y'. 


Example 1 


Define g: R? > R by g(a1, a2) = 2a, +49; clearly g is a linear transformation. 
Let 6 = {e1, e2}, and let y = g(e1)e1 + g(e2)e2 = 2e] + e2 = (2,1), as in the 
proof of Theorem 6.8. Then g(a, a2) = ((@1,@2),(2,1)) =2a,+a2.. 
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Theorem 6.9. Let V be a finite-dimensional inner product space, and let 
T be a linear operator on V. Then there exists a unique function T*: V — V 
such that (T(x), y) = («, T*(y)) for all x,y € V. Furthermore, T* is linear. 


Proof. Let y € V. Define g: V > F by g(x) = (T(x), y) for all x © V. We 
first show that g is linear. Let 71,42 € V andc€ F.. Then 


g(ca, + X2) = (T(ca, + 22), y) = (cT (a1) + T(x2),y) 
= c(T(x1),y) + (T(x2), y) = ce(a1) + g(%2). 


Hence g is linear. 

We now apply Theorem 6.8 to obtain a unique vector y’ € V such that 
g(x) = (a, y’); that is, (T(x), y) = (a, y’) for all « € V. Defining T*: V > V 
by T*(y) = y’, we have (T(x), y) = (w, T*(y)). 

To show that T* is linear, let y,,y2 € V and c € F. Then for any x € V, 
we have 


(x, T*(cy1 + y2)) = (T(x), cy1 + y2) 


Since x is arbitrary, T*(cy: + y2) = cT*(yi1) + T*(y2) by Theorem 6.1(e) 
(p. 333). 

Finally, we need to show that T* is unique. Suppose that U: V — V 
is linear and that it satisfies (T(x),y) = (x,U(y)) for all 2,y € V. Then 
(x, T*(y)) = (x, U(y)) for all z,y € V, so T* = U. | 


The linear operator T* described in Theorem 6.9 is called the adjoint of 
the operator T. The symbol T* is read “T star.” 

Thus T* is the unique operator on V satisfying (T(x), y) = (x, T*(y)) for 
all x,y € V. Note that we also have 


(x, T(y)) = (T(y),@) = (y, T*(@)) = (T*(a), 9); 


o (x, T(y)) = (T*(x),y) for all z,y € V. We may view these equations 
symbolically as adding a * to T when shifting its position inside the inner 
product symbol. 

For an infinite-dimensional inner product space, the adjoint of a linear op- 
erator T may be defined to be the function T* such that (T(x), y) = (x, T*(y)) 
for all x,y € V, provided it exists. Although the uniqueness and linearity of 
T* follow as before, the existence of the adjoint is not guaranteed (see Exer- 
cise 24). The reader should observe the necessity of the hypothesis of finite- 
dimensionality in the proof of Theorem 6.8. Many of the theorems we prove 
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about adjoints, nevertheless, do not depend on V being finite-dimensional. 
Thus, unless stated otherwise, for the remainder of this chapter we adopt the 
convention that a reference to the adjoint of a linear operator on an infinite- 
dimensional inner product space assumes its existence. 

Theorem 6.10 is a useful result for computing adjoints. 


Theorem 6.10. Let V be a finite-dimensional inner product space, and 
let 3 be an orthonormal basis for V. If T is a linear operator on V, then 


Proof. Let A = [T]s, B =[T*]s, and 6 = {v1, v2,..., Un}. Then from the 
corollary to Theorem 6.5 (p. 346), we have 


Biz = (T* (vj), vi) = (vi, T*(g)) = (T(vi), 05) = Agi = (A*)az- 
Hence B= A*. i 
Corollary. Let A be ann x n matrix. Then Ly» = (La)*. 


Proof. If @ is the standard ordered basis for F”, then, by Theorem 2.16 
(p. 93), we have [La]g = A. Hence [(La)*]¢ = [La]§ = A* = [La], and so 
(La)* = Lae. 


As an illustration of Theorem 6.10, we compute the adjoint of a specific 
linear operator. 


Example 2 


Let T be the linear operator on C? defined by T(a1, a2) = (2ia1 +3a2, a1 —az2). 
If G is the standard ordered basis for C?, then 


[Te = G =) 


So 


Hence 
Te (a1, az) = (—2ia, + ag, 3a, — az). 4 
The following theorem suggests an analogy between the conjugates of 
complex numbers and the adjoints of linear operators. 


Theorem 6.11. Let V be an inner product space, and let T and U be 
linear operators on V. Then 
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Cc 


v 


roof. We prove (a) and (d); the rest are proved similarly. Let x,y € V. 
a) Because 


— 


»y) = (T(x) + U(x), y) 
= (T(x), y) + (U(x), y) = (@, T*(y)) + (a, U*(y)) 
= (2, T*(y) + U"(y)) = (a, (T* + U*)(y)), 


T* + U* has the property unique to (T + U)*. Hence T* + U* = (T+ U)*. 
(d) Similarly, since 


(x, T(y)) = (T*(x),y) = (2, T™(y)) 5 
(d) follows. i 


The same proof works in the infinite-dimensional case, provided that the 
existence of T* and U* is assumed. 


Corollary. Let A and B ben x n matrices. Then 
) (A+B) = At + BY; 
(cA)* = CA* for all c € F; 


Proof. We prove only (c); the remaining parts can be proved similarly. 
Since L(aB)« = (Lap)* = (LaLp)* = (Le)*(La)* = LpxLay = Lp«a*, we 
have (AB)* = B*A*. 


In the preceding proof, we relied on the corollary to Theorem 6.10. An 
alternative proof, which holds even for nonsquare matrices, can be given by 
appealing directly to the definition of the conjugate transpose of a matrix 
(see Exercise 5). 


Least Squares Approximation 


Consider the following problem: An experimenter collects data by taking 
measurements y1, Y2,---;Ym at times t, t2,...,tm, respectively. For example, 
he or she may be measuring unemployment at various times during some 
period. Suppose that the data (t1, y1), (te, y2),---;(tm,Ym) are plotted as 
points in the plane. (See Figure 6.3.) From this plot, the experimenter 
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feels that there exists an essentially linear relationship between y and t, say 
y = ct+d, and would like to find the constants c and d so that the line 
y = ct +d represents the best possible fit to the data collected. One such 
estimate of fit is to calculate the error FE that represents the sum of the 
squares of the vertical distances from the points to the line; that is, 


i=l 


(ti, ct; + d) 


y=ct+d 


Figure 6.3 


Thus the problem is reduced to finding the constants c and d that minimize 
E. (For this reason the line y = ct + d is called the least squares line.) If 
we let 


t, 1 YA 
tg 1 Yo 
A= a (‘) , and y= ; 
d 
tm 1 Ym 


then it follows that E = ||y— Az||?. 

We develop a general method for finding an explicit vector xg € F” that 
minimizes E; that is, given an m x n matrix A, we find x € F” such that 
|ly— Azol| < ||y—Az|| for all vectors  € F”. This method not only allows us 
to find the linear function that best fits the data, but also, for any positive 
integer n, the best fit using a polynomial of degree at most n. 
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First, we need some notation and two simple lemmas. For x,y € F”, let 
(x,y), denote the standard inner product of x and y in F”. Recall that if x 
and y are regarded as column vectors, then (x, y),, = y*&. 


Lemma 1. Let AG Mnxn(F), « € F”, and y € F™. Then 
(AL, Y)m = (£,A*Y)n « 


Proof. By a generalization of the corollary to Theorem 6.11 (see Exer- 
cise 5(b)), we have 


(Ar, ¥)m = Y (Ar) = (y"A)a = (A*y)*a = (x, A*Y)y - | 


Lemma 2. Let A € Mmxn(F). Then rank(A* A) = rank(A). 


Proof. By the dimension theorem, we need only show that, for x € F”, 
we have A*Ax = 0 if and only if Ax = 0. Clearly, Ax = 0 implies that 
A* Ax = 0. So assume that A* Ax = 0. Then 


O= (Al An, at) = Ar A“ a), ={Ag, Az)... 
so that Ar = 0. | 


Corollary. If A is an m x n matrix such that rank(A) = n, then A*A is 
invertible. 


Now let A be an m x n matrix and y € F™. Define W = {Az: x € F”}; 
that is, W = R(L«). By the corollary to Theorem 6.6 (p. 350), there exists a 
unique vector in W that is closest to y. Call this vector Arg, where xo € F”. 
Then ||Azo — y|| < ||Ax — y|| for all x € F"; so ap has the property that 
E = ||Azo — y|| is minimal, as desired. 

To develop a practical method for finding such an xo, we note from The- 
orem 6.6 and its corollary that Arp — y € W+; so (Az, Axo — y),, = 0 for 
all « € F”. Thus, by Lemma 1, we have that (x, A*(Azo — y)),, = 0 for all 
x € F”; that is, A*(Axo — y) = 0. So we need only find a solution zo to 
A* Ax = A*y. If, in addition, we assume that rank(A) = n, then by Lemma 2 
we have ay = (A*A)~1A*y. We summarize this discussion in the following 
theorem. 


Theorem 6.12. Let A € Mmxn(F) and y € F™. Then there exists 
xo € F” such that (A*A)ap = A*y and || Azo — y|| < ||Ax — y]| for all x € F”. 
Furthermore, if rank(A) = n, then xo = (A*A)~1A*y. 
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To return to our experimenter, let us suppose that the data collected are 
(1, 2), (2,3), (3,5), and (4,7). Then 


1 1 2 
2 1 3 
A= 3 4 and y= 5 |: 
4 1 7 
hence 
lis yas 
«,_ {1 2 3 4 2 1] /30 10 
ara=(_ 1 1 ) 3 41 (i, i) 
4 1 
Thus 
1 4 —10 
* a ae 
Se 5p & a) 
Therefore 


2 ees 4 -10\/1 2 3 4 
OS BG IOs CBO Nd: te a A 
It follows that the line y = 1.7¢ is the least squares line. The error E may be 


computed directly as || Axo — y||? = 0.3. 
Suppose that the experimenter chose the times ¢; (1 <i < m) to satisfy 


41 


Then the two columns of A would be orthogonal, so A* A would be a diagonal 
matrix (see Exercise 19). In this case, the computations are greatly simplified. 

In practice, the m x 2 matrix A in our least squares application has rank 
equal to two, and hence A*A is invertible by the corollary to Lemma 2. For, 
otherwise, the first column of A is a multiple of the second column, which 
consists only of ones. But this would occur only if the experimenter collects 
all the data at exactly one time. 

Finally, the method above may also be applied if, for some k, the ex- 
perimenter wants to fit a polynomial of degree at most k to the data. For 
instance, if a polynomial y = ct? + dt + e of degree at most 2 is desired, the 
appropriate model is 


NI ot w be 
lI 
fas 
of 
“nN 
ad 


: : fe te a 
Be hah ae |) 6 and AS Ss 
€ bs a ome | 
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Minimal Solutions to Systems of Linear Equations 


Even when a system of linear equations Ax = 6 is consistent, there may 
be no unique solution. In such cases, it may be desirable to find a solution of 
minimal norm. A solution s to Ax = b is called a minimal solution if ||s|| < 
||u|| for all other solutions u. The next theorem assures that every consistent 
system of linear equations has a unique minimal solution and provides a 
method for computing it. 


Theorem 6.13. Let AG Mnxn(F) and b € F™. Suppose that Ax = b is 
consistent. Then the following statements are true. 
(a) There exists exactly one minimal solution s of Ax = 6, and s € R(La«). 
(b) The vector s is the only solution to Ax = b that lies in R(L4«); that is, 
if u satisfies (AA*)u = b, then s = A*u. 


Proof. (a) For simplicity of notation, we let W = R(L4-) and W’ = N(Ly). 
Let x be any solution to Az = b. By Theorem 6.6 (p. 350), « = s+ y for 
some s € W and y € Wt. But Wt = W’ by Exercise 12, and therefore 
b = Ax = As+ Ay = As. So s is a solution to Ax = 6 that lies in W. To 
prove (a), we need only show that s is the unique minimal solution. Let v be 
any solution to Ax = b. By Theorem 3.9 (p. 172), we have that v = s+ u, 
where u € W’. Since s € W, which equals w't by Exercise 12, we have 


lll]? = IIs + ul]? = [Isl]? + [lell? > Isl]? 


by Exercise 10 of Section 6.1. Thus s is a minimal solution. We can also see 
from the preceding calculation that if |u|] = ||s||, then wu = 0; hence v = s. 
Therefore s is the unique minimal solution to Ax = b, proving (a). 

(b) Assume that v is also a solution to Ax = b that lies in W. Then 


v—seEWnW =Wnw-t = {0}; 


sO U= 8. 

Finally, suppose that (AA*)u = b, and let v = A*u. Then v € W and 
Av = b. Therefore s = v = A*u by the discussion above. 
Example 3 


Consider the system 


x + 5y = 19 
Let 
2 1 4 
A= -1 2 and b=] -11 
5 0 19 
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To find the minimal solution to this system, we must first find some solution 
u to AA*x = b. Now 


6 1 11 
AM S| A on) a 
11 -4 26 


so we consider the system 


6x y+ llz= 4 
z+6y-— 4z=-I11 
lla — 4y+ 26z= 19, 


for which one solution is 


1 
u=|-2 
0 
(Any solution will suffice.) Hence 
—1 
s=Au= 4 
—3 


is the minimal solution to the given system. 


EXERCISES 


1. Label the following statements as true or false. Assume that the under- 
lying inner product spaces are finite-dimensional. 
(a) Every linear operator has an adjoint. 
(b) Every linear operator on V has the form x — (x, y) for some y € V. 
(c) For every linear operator T on V and every ordered basis (3 for V, 

we have [T*]g = ([T],)*. 

(d) The adjoint of a linear operator is unique. 
(e) For any linear operators T and U and scalars a and 6, 


(aT + bU)* = aT* + bU*. 


(f) For any n x n matrix A, we have (L4)* = La«. 
(g) For any linear operator T, we have (T*)* = T. 


2. For each of the following inner product spaces V (over F’) and linear 
transformations g: V > F, find a vector y such that g(x) = (x,y) for 
alla eV. 
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11. 
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(a) V=R’, g(a1,@2,43) = a, — 2az + dag 
(b) V=C?, g(z1, 22) = 21 — 220 


(c) V=P2(R) with (f,h) =| f(A) dt, e(f) = F(0) + f(A) 


For each of the following inner product spaces V and linear operators T 
on V, evaluate T* at the given vector in V. 
(a) V=R’, T(a,b) = (2a + b, a — 3b), x = (3,5). 
(b) V=C?, T(21, z2) = (221 + ize, (1 —i)z1), x = (8 —i,1 42%). 
1 


(©) V=Pi(R) with (f,9) =f Fg(tat, TA) = #4 3f, 
f(t) =4—2t 7 
Complete the proof of Theorem 6.11. 


(a) Complete the proof of the corollary to Theorem 6.11 by using 
Theorem 6.11, as in the proof of (c). 

(b) State a result for nonsquare matrices that is analogous to the corol- 
lary to Theorem 6.11, and prove it using a matrix argument. 


Let T be a linear operator on an inner product space V. Let U; = T+T* 
and Ug = TT*. Prove that U; = U} and U2 = U3. 


Give an example of a linear operator T on an inner product space V 
such that N(T) 4 N(T*). 


Let V be a finite-dimensional inner product space, and let T be a linear 
operator on V. Prove that if T is invertible, then T* is invertible and 
bl ae = Gees be 


Prove that if V = W@W? and T is the projection on W along W+, 
then T = T*. Hint: Recall that N(T) = W+. (For definitions, see the 
exercises of Sections 1.3 and 2.1.) 


Let T be a linear operator on an inner product space V. Prove that 
||T(x)|| = ||a|| for all 2 € V if and only if (T(x), T(y)) = (x,y) for all 
x,y €V. Hint: Use Exercise 20 of Section 6.1. 


For a linear operator T on an inner product space V, prove that T*T = 
To implies T = To. Is the same result true if we assume that TT* = To? 


Let V be an inner product space, and let T be a linear operator on V. 

Prove the following results. 

(a) R(T*)~ =N(T). 

(b) If V is finite-dimensional, then R(T*) = N(T)+. Hint: Use Exer- 
cise 13(c) of Section 6.2. 
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13. 


14. 


Let T be a linear operator on a finite-dimensional vector space V. Prove 
the following results. 

(a) N(T*T) = N(T). Deduce that rank(T*T) = rank(T). 

(b) rank(T) = rank(T*). Deduce from (a) that rank(TT*) = rank(T). 
(c) For any n x n matrix A, rank(A* A) = rank(AA*) = rank(A). 


Let V be an inner product space, and let y,z € V. Define T: V — V by 
T(x) = (@,y)z for all x € V. First prove that T is linear. Then show 
that T* exists, and find an explicit expression for it. 


The following definition is used in Exercises 15-17 and is an extension of the 
definition of the adjoint of a linear operator. 


Definition. Let T: V— W be a linear transformation, where V and W 
are finite-dimensional inner product spaces with inner products (-,-+), and 
(+,*)y, respectively. A function T*: W — V is called an adjoint of T if 
(T(x), y)o = (@, T*(y)), for all x € V andy € W. 


15. 


16. 


17. 


18.7 


19. 


20. 


Let T: V — W be a linear transformation, where V and W are finite- 

dimensional inner product spaces with inner products (-,+), and (+, +)., 

respectively. Prove the following results. 

(a) There is a unique adjoint T* of T, and T* is linear. 

(b) If 6 and ¥ are orthonormal bases for V and W, respectively, then 
[T*]5 = ((T]3)*- 

(c) rank(T*) = rank(T). 

(d) (T*(x),y), = (x, T(y)), for all  € W and y € V. 

(e) For all x € V, T*T(x) = 0 if and only if T(x) = 0. 


State and prove a result that extends the first four parts of Theorem 6.11 
using the preceding definition. 


Let T: V — W be a linear transformation, where V and W are finite- 
dimensional inner product spaces. Prove that (R(T*))+ = N(T), using 
the preceding definition. 


Let A be an n x n matrix. Prove that det(A*) = det(A). 


Suppose that A is an mxn matrix in which no two columns are identical. 
Prove that A*A is a diagonal matrix if and only if every pair of columns 
of A is orthogonal. 


For each of the sets of data that follows, use the least squares approx- 
imation to find the best fits with both (i) a linear function and (ii) a 
quadratic function. Compute the error F in both cases. 


(a) 13; 9), (—2, 6), (0, 2), (1, 1)} 
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(b) {(1,2), (3,4), (5,7), (7,9), (9, 12)} 
(c) {(-2,4), (4) 3), (0, 1), (1, 1); (2, —3)} 


21. In physics, Hooke’s law states that (within certain limits) there is a 
linear relationship between the length x of a spring and the force y 
applied to (or exerted by) the spring. That is, y = cx + d, where c is 
called the spring constant. Use the following data to estimate the 
spring constant (the length is given in inches and the force is given in 


pounds). 
Length Force 
x y 
3.5 1.0 
4.0 2.2 
4.5 2.8 
5.0 4.3 


22. Find the minimal solution to each of the following systems of linear 


equations. 
g+2y-—z= 
(a) ©+2y-—z=12 (b) 224+ 3y+z= 
4x + Ty—z=4 
zt+ty—z=0 
_ tty+z-w=l 
(c) 2ea-y+z2=3 (d) Sg Sage 
ge-ytz=2 


23. Consider the problem of finding the least squares line y = ct + d corre- 
sponding to the m observations (t1, y1), (t2, y2),---; (tm; Ym): 


(a) Show that the equation (A* A) = A*y of Theorem 6.12 takes the 
form of the normal equations: 


i=1 i=1 i=1 
and 
(35) c+md= oe 
i=1 i=1 


These equations may also be obtained from the error E by setting 
the partial derivatives of F with respect to both c and d equal to 
zero. 
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(b) Use the second normal equation of (a) to show that the least 
squares line must pass through the center of mass, (t,7), where 


- ic i 
t= ti and Da Du 


24. Let V and {e1, €2,...} be defined as in Exercise 23 of Section 6.2. Define 
T: V—V by 


CO 
= S- a(t) for every positive integer k. 
i=k 


Notice that the infinite series in the definition of T converges because 
o(i) £0 for only finitely many 7. 


(a) Prove that T is a linear operator on V. 

(b) Prove that for any positive integer n, T(en) = S07, ei. 

(c) Prove that T has no adjoint. Hint: By way of contradiction, 
suppose that T* exists. Prove that for any positive integer n, 
T*(en)(k) #0 for infinitely many k. 


6.4 NORMAL AND SELF-ADJOINT OPERATORS 


We have seen the importance of diagonalizable operators in Chapter 5. For 
these operators, it is necessary and sufficient for the vector space V to possess 
a basis of eigenvectors. As V is an inner product space in this chapter, it 
is reasonable to seek conditions that guarantee that V has an orthonormal 
basis of eigenvectors. A very important result that helps achieve our goal is 
Schur’s theorem (Theorem 6.14). The formulation that follows is in terms of 
linear operators. The next section contains the more familiar matrix form. 
We begin with a lemma. 


Lemma. Let T be a linear operator on a finite-dimensional inner product 
space V. If T has an eigenvector, then so does T*. 


Proof. Suppose that v is an eigenvector of T with corresponding eigenvalue 
X. Then for any z € V, 


0 = (0,2) = ((T —Al)(v), 2) = (v, (T — Al)*(2)) = (v, (I* — Al)(a)), 


and hence v is orthogonal to the range of T* — Al. So T* — Al is not onto 
and hence is not one-to-one. Thus 7* — Al has a nonzero null space, and any 
nonzero vector in this null space is an eigenvector of T* with corresponding 
eigenvalue 2. | 
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Recall (see the exercises of Section 2.1 and see Section 5.4) that a subspace 
W of V is said to be T-invariant if T(W) is contained in W. If W is T- 
invariant, we may define the restriction Tw: W — W by Tw(2) = T(2) for all 
x € W. It is clear that Ty is a linear operator on W. Recall from Section 5.2 
that a polynomial is said to split if it factors into linear polynomials. 


Theorem 6.14 (Schur). Let T be a linear operator on a finite- 
dimensional inner product space V. Suppose that the characteristic poly- 
nomial of T splits. Then there exists an orthonormal basis 3 for V such that 
the matrix [T]g is upper triangular. 


Proof. The proof is by mathematical induction on the dimension n of V. 
The result is immediate if n = 1. So suppose that the result is true for linear 
operators on (n — 1)-dimensional inner product spaces whose characteristic 
polynomials split. By the lemma, we can assume that T* has a unit eigen- 
vector z. Suppose that T*(z) = Az and that W = span({z}). We show that 
W+ is T-invariant. If y¢ W+ and 2 = cz € W, then 


So T(y) € Wt. It is easy to show (see Theorem 5.21 p. 314, or as a con- 
sequence of Exercise 6 of Section 4.4) that the characteristic polynomial of 
Tw. divides the characteristic polynomial of T and hence splits. By Theo- 
rem 6.7(c) (p. 352), dim(W+) = n —1, so we may apply the induction hy- 
pothesis to Tw+ and obtain an orthonormal basis y of W+ such that [Tw.]¥ 
is upper triangular. Clearly, @ = yU {z} is an orthonormal basis for V such 
that [T]g is upper triangular. i 


We now return to our original goal of finding an orthonormal basis of 
eigenvectors of a linear operator T on a finite-dimensional inner product space 
V. Note that if such an orthonormal basis @ exists, then [T]g is a diagonal 
matrix, and hence [T*]g = [T]% is also a diagonal matrix. Because diagonal 
matrices commute, we conclude that T and T* commute. Thus if V possesses 
an orthonormal basis of eigenvectors of T, then TT* = T*T . 


Definitions. Let V be an inner product space, and let T be a linear 
operator on V. We say that T is normal if TT* = T*T. Ann x n real or 
complex matrix A is normal if AA* = A* A. 


It follows immediately from Theorem 6.10 (p. 359) that T is normal if and 
only if [T]g is normal, where @ is an orthonormal basis. 
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Example 1 


Let T: R? — R? be rotation by 6, where 0 < 6 < 7. The matrix representation 
of T in the standard ordered basis is given by 


cos@ —sind 
A Ge ay ' 
Note that AA* = I = A*A; so A, and hence T, is normal. 


Example 2 


Suppose that A is a real skew-symmetric matrix; that is, A’ = —A. Then A 
is normal because both AA‘ and A‘A are equal to —A?. 


Clearly, the operator T in Example 1 does not even possess one eigenvec- 
tor. So in the case of a real inner product space, we see that normality is not 
sufficient to guarantee an orthonormal basis of eigenvectors. All is not lost, 
however. We show that normality suffices if V is a complex inner product 
space. 

Before we prove the promised result for normal operators, we need some 
general properties of normal operators. 


Theorem 6.15. Let V be an inner product space, and let T be a normal 
operator on V. Then the following statements are true. 

(a) ||T(a)|| = ||T*(2)|| for alla € V. 

(b) T —cl is normal for every c € F. 

(c) Ifa is an eigenvector of T, then x is also an eigenvector of T*. In fact, 
if T(z) = Aw, then T*(x) = dz. 

(d) If 1 and 2 are distinct eigenvalues of T with corresponding eigenvec- 
tors x1 and x2, then x; and £2 are orthogonal. 


Proof. (a) For any x € V, we have 
T(x) |? = (T(e), T()) = (T*T(2), 2) = (TT*(a), 2) 
= (T*(2), T*(x)) = ||T*(@)|?. 


The proof of (b) is left as an exercise. 
(c) Suppose that T(x) = Ax for some « € V. Let U = T—Al. Then 
U(x) = 0, and U is normal by (b). Thus (a) implies that 


0 = ||U()|] = ||U*(x) |] = (T* — AN) @) I] = [T*(@) — Az]. 


Hence T*(x) = Ax. So zx is an eigenvector of T*. 
(d) Let A, and Az be distinct eigenvalues of T with corresponding eigen- 
vectors 2; and xg. Then, using (c), we have 


Ai (@1,%2) = (A1@1, £2) = (T (#1), 22) = (#1, T*(x2)) 
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= (21, A222) = Az (x1, 22). 
Since Ay # Ag, we conclude that (71,72) = 0. | 


Theorem 6.16. Let T be a linear operator on a finite-dimensional com- 
plex inner product space V. Then T is normal if and only if there exists an 
orthonormal basis for V consisting of eigenvectors of T. 


Proof. Suppose that T is normal. By the fundamental theorem of algebra 
(Theorem D.4), the characteristic polynomial of T splits. So we may apply 
Schur’s theorem to obtain an orthonormal basis @ = {v1,v2,...,Un} for V 
such that [T]g = A is upper triangular. We know that v; is an eigenvector 
of T because A is upper triangular. Assume that v1, v2,...,Uz—1 are eigen- 
vectors of T. We claim that v, is also an eigenvector of T. It then follows 
by mathematical induction on k& that all of the v,’s are eigenvectors of T. 
Consider any j < k, and let A; denote the eigenvalue of T corresponding to 
v;. By Theorem 6.15, T*(vj;) = \;v;. Since A is upper triangular, 


T(ug) = Aigur + Aogve +--+ 4 Ajkvj f+» + Appup. 


Furthermore, by the corollary to Theorem 6.5 (p. 347), 
Ajr = (T(ve), 03) = (ve, T*(0j)) = (vm, Agy) = Ag (UK, v7) = 0. 


It follows that T(vz,) = Azpvz, and hence vz is an eigenvector of T. So by 
induction, all the vectors in @ are eigenvectors of T. 
The converse was already proved on page 370. | 


Interestingly, as the next example shows, Theorem 6.16 does not extend 
to infinite-dimensional complex inner product spaces. 


Example 3 


Consider the inner product space H with the orthonormal set $ from Exam- 
ple 9 in Section 6.1. Let V = span(S), and let T and U be the linear operators 
on V defined by T(f) = fi f and U(f) = f_if. Then 


ithe) = fn+1 and U(fn) = fn-1 


for all integers n. Thus 


a ee oe) _ figs) 7 O(m-41),n = dm,(n—1) omg tras fii) = Fras U(fn)) : 
It follows that U = T*. Furthermore, TT* = |= T*T; so T is normal. 


We show that T has no eigenvectors. Suppose that f is an eigenvector of 
T, say, T(f) =Af for some . Since V equals the span of S, we may write 


f= Sai fi, where am, # 0. 


=n 
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Hence 

m m 

afin = TF = Af = > aif: 

i=n i=n 
Since a, ~ 0, we can write f,+41 as a linear combination of fr, fn4i,---5fm- 
But this is a contradiction because S$ is linearly independent. 


Example 1 illustrates that normality is not sufficient to guarantee the 
existence of an orthonormal basis of eigenvectors for real inner product spaces. 
For real inner product spaces, we must replace normality by the stronger 
condition that T = T* in order to guarantee such a basis. 


Definitions. Let T be a linear operator on an inner product space V. 
We say that T is self-adjoint (Hermitian) if T = T*. An nxn real or 
complex matrix A is self-adjoint (Hermitian) if A = A*. 


It follows immediately that if @ is an orthonormal basis, then T is self- 
adjoint if and only if [T]g is self-adjoint. For real matrices, this condition 
reduces to the requirement that A be symmetric. 

Before we state our main result for self-adjoint operators, we need some 
preliminary work. 

By definition, a linear operator on a real inner product space has only 
real eigenvalues. The lemma that follows shows that the same can be said 
for self-adjoint operators on a complex inner product space. Similarly, the 
characteristic polynomial of every linear operator on a complex inner product 
space splits, and the same is true for self-adjoint operators on a real inner 
product space. 


Lemma. Let T be a self-adjoint operator on a finite-dimensional inner 
product space V. Then 
(a) Every eigenvalue of T is real. 
(b) Suppose that V is a real inner product space. Then the characteristic 
polynomial of T splits. 


Proof. (a) Suppose that T(a) = Ax for « # 0. Because a self-adjoint 
operator is also normal, we can apply Theorem 6.15(c) to obtain 


Ag = T(x) = T*(a) = dz. 


So \ = 4; that is, is real. 

(b) Let n = dim(V), @ be an orthonormal basis for V, and A = [T],. 
Then A is self-adjoint. Let T, be the linear operator on C” defined by 
Ta(z) = Az for all « € C”. Note that Ty is self-adjoint because [T 4], = A, 
where 7 is the standard ordered (orthonormal) basis for C”. So, by (a), 
the eigenvalues of T,4 are real. By the fundamental theorem of algebra, the 
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characteristic polynomial of T , splits into factors of the form t—.. Since each 
X is real, the characteristic polynomial splits over R. But T,4 has the same 
characteristic polynomial as A, which has the same characteristic polynomial 
as T. Therefore the characteristic polynomial of T splits. 


We are now able to establish one of the major results of this chapter. 


Theorem 6.17. Let T be a linear operator on a finite-dimensional real 
inner product space V. Then T is self-adjoint if and only if there exists an 
orthonormal basis 3 for V consisting of eigenvectors of T. 


Proof. Suppose that T is self-adjoint. By the lemma, we may apply Schur’s 
theorem to obtain an orthonormal basis ( for V such that the matrix A = [T], 
is upper triangular. But 


A* = |T]p = [T"]e = [Te = 4. 


So A and A* are both upper triangular, and therefore A is a diagonal matrix. 
Thus (@ must consist of eigenvectors of T. 
The converse is left as an exercise. B 


Theorem 6.17 is used extensively in many areas of mathematics and statis- 
tics. We restate this theorem in matrix form in the next section. 


Example 4 


As we noted earlier, real symmetric matrices are self-adjoint, and self-adjoint 
matrices are normal. The following matrix A is complex and symmetric: 


ii ae 
ea 4). Saas): 


But A is not normal, because (AA*)i2 = 1+7 and (A*A)12 = 1—i. Therefore 
complex symmetric matrices need not be normal. 


EXERCISES 


1. Label the following statements as true or false. Assume that the under- 
lying inner product spaces are finite-dimensional. 


(a) Every self-adjoint operator is normal. 

(b) Operators and their adjoints have the same eigenvectors. 

(c) If T is an operator on an inner product space V, then T is normal 
if and only if [T]g is normal, where is any ordered basis for V. 

(d) A real or complex matrix A is normal if and only if L4 is normal. 

(e) The eigenvalues of a self-adjoint operator must all be real. 
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(f) The identity and zero operators are self-adjoint. 
(g) Every normal operator is diagonalizable. 
(h) Every self-adjoint operator is diagonalizable. 


2. For each linear operator T on an inner product space V, determine 
whether T is normal, self-adjoint, or neither. If possible, produce an 
orthonormal basis of eigenvectors of T for V and list the corresponding 
eigenvalues. 

(a) V=R?* and T is defined by T(a, b) 
(b) V=R° and T is defined by T(a, b,c) = (—a + b, 5b, 4a — 2b + 5c). 
(c) V=C? and T is defined by T(a,b) = (2a + ib,a + 2b). 

(d) V = P2(R) and T is defined by T(f) = f’, where 


= (2a — 2b, —2a + 5b). 


1 


(f,9) = | f(t)g(t) dt. 


0 
(e) V=Mox2(R) and T is defined by T(A) = A’. 


(f) V =Moy2(R) and T is defined by T (: i= = ‘ ) 
3. Give an example of a linear operator T on R? and an ordered basis for 
R? that provides a counterexample to the statement in Exercise 1(c). 


4. Let T and U be self-adjoint operators on an inner product space V. 
Prove that TU is self-adjoint if and only if TU = UT. 


5. Prove (b) of Theorem 6.15. 


6. Let V be a complex inner product space, and let T be a linear operator 
on V. Define 


1 1 
T, = =(T+T*) and T2=—(T-—T*). 
2 24 


(a) Prove that T; and T» are self-adjoint and that T= T, +iTo. 

(b) Suppose also that T = U; +7U2, where U; and Up are self-adjoint. 
Prove that Uy = Ti and Us = To. 

(c) Prove that T is normal if and only if T;T2 = TaT1. 


7. Let T be a linear operator on an inner product space V, and let W be 
a T-invariant subspace of V. Prove the following results. 


(a) If T is self-adjoint, then Ty is self-adjoint. 

(b) W+ is T*-invariant. 

(c) If W is both T- and T*-invariant, then (Tw)* = (T*)w. 

(d) If W is both T- and T*-invariant and T is normal, then Ty is 
normal. 
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11. 


12. 


13. 


14. 
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Let T be a normal operator on a finite-dimensional complex inner 
product space V, and let W be a subspace of V. Prove that if W is 
T-invariant, then W is also T*-invariant. Hint: Use Exercise 24 of Sec- 
tion 5.4. 


Let T be a normal operator on a finite-dimensional inner product space 
V. Prove that N(T) = N(T*) and R(T) = R(T*). Hint: Use Theo- 
rem 6.15 and Exercise 12 of Section 6.3. 


Let T be a self-adjoint operator on a finite-dimensional inner product 
space V. Prove that for all « € V 


T(x) + ial]? = ||T(a) |? + lle? 
Deduce that T — il is invertible and that [(T — il)~+]* = (T+ il)7?. 


Assume that T is a linear operator on a complex (not necessarily finite- 
dimensional) inner product space V with an adjoint T*. Prove the 
following results. 


(a) If T is self-adjoint, then (T(x), 2) is real for all x € V. 

(b) If T satisfies (T(x),x) = 0 for all x € V, then T = To. Hint: 
Replace x by «+ y and then by x + iy, and expand the resulting 
inner products. 

(c) If (T(z),2) is real for all x € V, then T = T*. 


Let T be a normal operator on a finite-dimensional real inner product 
space V whose characteristic polynomial splits. Prove that V has an 
orthonormal basis of eigenvectors of T. Hence prove that T is self- 
adjoint. 


Annxn real matrix A is said to be a Gramian matrix if there exists a 
real (square) matrix B such that A = B'B. Prove that A is a Gramian 
matrix if and only if A is symmetric and all of its eigenvalues are non- 
negative. Hint: Apply Theorem 6.17 to T = Ly to obtain an orthonor- 
mal basis {v1, v2,..., Un} of eigenvectors with the associated eigenvalues 
Ai, A2,--+;An- Define the linear operator U by U(vj) = VAivi- 


Simultaneous Diagonalization. Let V be a finite-dimensional real inner 
product space, and let U and T be self-adjoint linear operators on V 
such that UT = TU. Prove that there exists an orthonormal basis for 
V consisting of vectors that are eigenvectors of both U and T. (The 
complex version of this result appears as Exercise 10 of Section 6.6.) 
Hint: For any eigenspace W = E) of T, we have that W is both T- and 
U-invariant. By Exercise 7, we have that W+ is both T- and U-invariant. 
Apply Theorem 6.17 and Theorem 6.6 (p. 350). 
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15. Let A and B be symmetric n x n matrices such that AB = BA. Use 
Exercise 14 to prove that there exists an orthogonal matrix P such that 
P*AP and P*BP are both diagonal matrices. 


16. Prove the Cayley—Hamilton theorem for a complex nxn matrix A. That 
is, if f(t) is the characteristic polynomial of A, prove that f(A) = O. 
Hint: Use Schur’s theorem to show that A may be assumed to be upper 
triangular, in which case 


n 


f(t) = [ [Au - 9). 


i=l 


Now if T = La, we have (A,;,;| — T)(e;) € span({e1, €2,...,e;-1}) for 
j > 2, where {e1, e2,...,@n} is the standard ordered basis for C”. (The 
general case is proved in Section 5.4.) 


The following definitions are used in Exercises 17 through 23. 


Definitions. A linear operator T on a finite-dimensional inner product 
space is called positive definite |positive semidefinite] if T is self-adjoint 
and (T(a),z) > 0 [(T(x), x) > 0] for alla 4 0. 

Ann xn matrix A with entries from R or C is called positive definite 
[positive semidefinite] if L4 is positive definite [positive semidefinite]. 


17. Let T and U be aself-adjoint linear operators on an n-dimensional inner 
product space V, and let A = [T]g, where ( is an orthonormal basis for 
V. Prove the following results. 


(a) T is positive definite [semidefinite] if and only if all of its eigenval- 
ues are positive [nonnegative]. 
(b) T is positive definite if and only if 


Ss Aj;4;@; > 0 for all nonzero n-tuples (a1, a2,... ,@n). 
tj 


(c) T is positive semidefinite if and only if A= B*B for some square 
matrix B. 

(d) If T and U are positive semidefinite operators such that T? = U?, 
then T = U. 

(e) IfT and U are positive definite operators such that TU = UT, then 
TU is positive definite. 

(f) T is positive definite [semidefinite] if and only if A is positive def- 
inite [semidefinite]. 

Because of (f), results analogous to items (a) through (d) hold for ma- 

trices as well as operators. 
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Let T: V — W be a linear transformation, where V and W are finite- 
dimensional inner product spaces. Prove the following results. 


(a) T*T and TT™* are positive semidefinite. (See Exercise 15 of Sec- 
tion 6.3.) 
(b) rank(T*T) = rank(TT*) = rank(T). 


Let T and U be positive definite operators on an inner product space 
V. Prove the following results. 


(a) T+ U is positive definite. 
(b) Ifc> 0, then cT is positive definite. 
(c) T~! is positive definite. 


Let V be an inner product space with inner product (-,-), and let T be 
a positive definite linear operator on V. Prove that (2, y)’ = (T(z), y) 
defines another inner product on V. 


Let V be a finite-dimensional inner product space, and let T and U be 
self-adjoint operators on V such that T is positive definite. Prove that 
both TU and UT are diagonalizable linear operators that have only real 
eigenvalues. Hint: Show that UT is self-adjoint with respect to the inner 
product (x, y)’ = (T(x), y). To show that TU is self-adjoint, repeat the 
argument with T~! in place of T. 


This exercise provides a converse to Exercise 20. Let V be a finite- 
dimensional inner product space with inner product (+, +), and let (-, -)’ 
be any other inner product on V. 


(a) Prove that there exists a unique linear operator T on V such 
that (x,y)’ = (T(x),y) for all x and y in V. Hint: Let 6 = 
{v1, v2,...,Un} be an orthonormal basis for V with respect to 
(-,+), and define a matrix A by Aj; = (v;,v;)’ for all i and j. 
Let T be the unique linear operator on V such that [T]g = A. 

(b) Prove that the operator T of (a) is positive definite with respect 
to both inner products. 


Let U be a diagonalizable linear operator on a finite-dimensional inner 
product space V such that all of the eigenvalues of U are real. Prove that 
there exist positive definite linear operators T; and T{ and self-adjoint 
linear operators Tz and T4 such that U = T2T; = T4174. Hint: Let (-, -) 
be the inner product associated with V, @ a basis of eigenvectors for U, 
(-,+)’ the inner product on V with respect to which @ is orthonormal 
(see Exercise 22(a) of Section 6.1), and T, the positive definite operator 
according to Exercise 22. Show that U is self-adjoint with respect to 
(-,+)’ and U = Ty'U*T, (the adjoint is with respect to (-,-)). Let 
eh 
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24. This argument gives another proof of Schur’s theorem. Let T be a linear 
operator on a finite dimensional inner product space V. 


(a) Suppose that ( is an ordered basis for V such that [T]g is an upper 
triangular matrix. Let y be the orthonormal basis for V obtained 
by applying the Gram-—Schmidt orthogonalization process to 3 and 
then normalizing the resulting vectors. Prove that [T], is an upper 
triangular matrix. 

(b) Use Exercise 32 of Section 5.4 and (a) to obtain an alternate proof 
of Schur’s theorem. 


6.5 UNITARY AND ORTHOGONAL OPERATORS 
AND THEIR MATRICES 


In this section, we continue our analogy between complex numbers and linear 
operators. Recall that the adjoint of a linear operator acts similarly to the 
conjugate of a complex number (see, for example, Theorem 6.11 p. 359). A 
complex number z has length 1 if zz = 1. In this section, we study those 
linear operators T on an inner product space V such that TT* = T*T =|. We 
will see that these are precisely the linear operators that “preserve length” 
in the sense that ||T(«)|| = ||a|| for all 2 € V. As another characterization, 
we prove that, on a finite-dimensional complex inner product space, these are 
the normal operators whose eigenvalues all have absolute value 1. 

In past chapters, we were interested in studying those functions that pre- 
serve the structure of the underlying space. In particular, linear operators 
preserve the operations of vector addition and scalar multiplication, and iso- 
morphisms preserve all the vector space structure. It is now natural to con- 
sider those linear operators T on an inner product space that preserve length. 
We will see that this condition guarantees, in fact, that T preserves the inner 
product. 


Definitions. Let T be a linear operator on a finite-dimensional inner 
product space V (over F'). If ||T(x)|| = ||a|| for all x € V, we call T a unitary 
operator if F = C and an orthogonal operator if F = R. 


It should be noted that, in the infinite-dimensional case, an operator sat- 
isfying the preceding norm requirement is generally called an isometry. If, 
in addition, the operator is onto (the condition guarantees one-to-one), then 
the operator is called a unitary or orthogonal operator. 

Clearly, any rotation or reflection in R? preserves length and hence is 
an orthogonal operator. We study these operators in much more detail in 
Section 6.11. 
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Example 1 


Let h € H satisfy |h(av)| = 1 for all x. Define the linear operator T on H by 
T(f) =hf. Then 


ITIP = ase = =f MOOR FHae= Ws? 


since |h(t)|? = 1 for all t. So T is a unitary operator. 


Theorem 6.18. Let T be a linear operator on a finite-dimensional inner 
product space V. Then the following statements are equivalent. 

(a) TT*=T*T=I. 

(b) (T(x), T(y)) = (#,y) for all x,y EV. 

(c) If @ is an orthonormal basis for V, then T(@) is an orthonormal basis 
for V. 

(d) There exists an orthonormal basis 3 for V such that T(@) is an orthonor- 
mal basis for V. 

(e) ||T(x)|| = ||a|| for alla EV. 


Thus all the conditions above are equivalent to the definition of a uni- 
tary or orthogonal operator. From (a), it follows that unitary or orthogonal 
operators are normal. 

Before proving the theorem, we first prove a lemma. Compare this lemma 
to Exercise 11(b) of Section 6.4. 


Lemma. Let U be a self-adjoint operator on a finite-dimensional inner 
product space V. If (x, U(#)) =0 for all x € V, then U = To. 


Proof. By either Theorem 6.16 (p. 372) or 6.17 (p. 374), we may choose 
an orthonormal basis (3 for V consisting of eigenvectors of U. If x € 6, then 
U(a) = Ax for some A. Thus 


0 = (x, U(x)) = (2, Ax) = A(x, 2) ; 
so \ = 0. Hence U(x) = 0 for all x € 8, and thus U = To. | 


Proof of Theorem 6.18. We prove first that (a) implies (b). Let 2,y € V. 
Then (c,y) = (T*T(z),y) = (T(x), Ty). 

Second, we prove that (b) implies (c). Let @ = {v1,v2,...,Un} be an 
orthonormal basis for V; so T(3) = {T(v1), T(v2),--., Twn) }. It follows that 
(T(u;), T(vj)) = (v;,0;) = 6:3. Therefore T(@) is an orthonormal basis for V. 

That (c) implies (d) is obvious. 

Next we prove that (d) implies (e). Let « € V, and let B = {v1, v2,..., Un}. 
Now 


n 
t= ) AZU; 
i=1 
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for some scalars a;, and so 


n n n n 
|x|]? = (x A; Vj, S- wn) = S- S> a;dj (Vi, V5) 
i=1 j=l i=1j=1 
n n n 
= a 2 
= S2¥ my = Yo 
i=1j=1 i=1 


since (@ is orthonormal. 
Applying the same manipulations to 


T(x) = pe a; T(v;) 


and using the fact that T(@) is also orthonormal, we obtain 


n 


IT(@)P? = So lal. 


i=1 
Hence ||T(x)|| = |lel|- 
Finally, we prove that (e) implies (a). For any x € V, we have 


(x, x) = |[x||? = ||T(«)||? = (T(2), T(z) = (e, T*T(2)). 


So (x, (1—T*T)(x)) = 0 for alla € V. Let U = 1-—T*T; then U is self- 
adjoint, and (a,U(a)) = 0 for all « € V. Hence, by the lemma, we have 
To =U =I1-T*T, and therefore T*T =I. Since V is finite-dimensional, we 


may use Exercise 10 of Section 2.4 to conclude that TT* = I. 


It follows immediately from the definition that every eigenvalue of a uni- 
tary or orthogonal operator has absolute value 1. In fact, even more is true. 


Corollary 1. Let T be a linear operator on a finite-dimensional real 
inner product space V. Then V has an orthonormal basis of eigenvectors of 
T with corresponding eigenvalues of absolute value 1 if and only if T is both 
self-adjoint and orthogonal. 


Proof. Suppose that V has an orthonormal basis {v1, v2,...,Un} such that 
T(v;i) = Axv; and |A;| = 1 for all 7. By Theorem 6.17 (p. 374), T is self-adjoint. 
Thus (TT*) (v4) = T(AGu;) = NAW; = ru; = Vi for each 7. So TT* = I, and 
again by Exercise 10 of Section 2.4, T is orthogonal by Theorem 6.18(a). 

If T is self-adjoint, then, by Theorem 6.17, we have that V possesses an 
orthonormal basis {v1, v2,..., Un} such that T(v;) = A;v; for all 7. If T is also 
orthogonal, we have 


[Aal + [lvall = [Asal] = Ta) = lleills 
so |\;| = 1 for every 7. | 
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Corollary 2. Let T be a linear operator on a finite-dimensional complex 
inner product space V. Then V has an orthonormal basis of eigenvectors of T 
with corresponding eigenvalues of absolute value 1 if and only if T is unitary. 


Proof. The proof is similar to the proof of Corollary 1. | 


Example 2 


Let T: R? — R? be a rotation by 6, where 0 < @ < z. It is clear geometrically 
that T “preserves length”, that is, that ||T(2)|| = ||2|| for all 2 € R?. The 
fact that rotations by a fixed angle preserve perpendicularity not only can be 
seen geometrically but now follows from (b) of Theorem 6.18. Perhaps the 
fact that such a transformation preserves the inner product is not so obvious; 
however, we obtain this fact from (b) also. Finally, an inspection of the matrix 
representation of T with respect to the standard ordered basis, which is 


cos? —siné 

sin 6 cos 6} ’ 
reveals that T is not self-adjoint for the given restriction on 0. As we men- 
tioned earlier, this fact also follows from the geometric observation that T 


has no eigenvectors and from Theorem 6.15 (p. 371). It is seen easily from 
the preceding matrix that T* is the rotation by -6. 


Definition. Let L be a one-dimensional subspace of R?. We may view L 
as a line in the plane through the origin. A linear operator T on R? is called 
a reflection of R? about L if T(x) = x for all x € L and T(x) = —=2 for all 
eet, 


As an example of a reflection, consider the operator defined in Example 3 of 
Section 2.5. 


Example 3 


Let T be a reflection of R? about a line L through the origin. We show that 
T is an orthogonal operator. Select vectors v; € L and v2 € L+ such that 
lui |] = |lvg|| = 1. Then T(v) = vy and T(v2g) = —ve. Thus v; and v2 
are eigenvectors of T with corresponding eigenvalues 1 and —1, respectively. 
Furthermore, {v1, v2} is an orthonormal basis for R?. It follows that T is an 
orthogonal operator by Corollary 1 to Theorem 6.18. 


We now examine the matrices that represent unitary and orthogonal trans- 
formations. 


Definitions. A square matrix A is called an an orthogonal matrix if 
A'tA = AAt =I and unitary if A*A = AA* =I. 
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Since for a real matrix A we have A* = A’, a real unitary matrix is also 
orthogonal. In this case, we call A orthogonal rather than unitary. 

Note that the condition AA* = I is equivalent to the statement that the 
rows of A form an orthonormal basis for F” because 


oij = = Liz = = (AA*); j = Ant A*) )ej = Yo Aue 
k=1 


and the last term represents the inner product of the ith and jth rows of A. 
A similar remark can be made about the columns of A and the condition 
A A= 
It also follows from the definition above and from Theorem 6.10 (p. 359) 
that a linear operator T on an inner product space V is unitary [orthogonal] 
if and only if [T]g is unitary [orthogonal] for some orthonormal basis for V. 


Example 4 


From Example 2, the matrix 

cos? —siné 

sin 6 cos 6 
is clearly orthogonal. One can easily see that the rows of the matrix form 
an orthonormal basis for R?. Similarly, the columns of the matrix form an 
orthonormal basis for R?. 


Example 5 


Let T be a reflection of R? about a line L through the origin, let 3 be the 
standard ordered basis for R?, and let A = [T]g. Then T = Ly. Since T is 
an orthogonal operator and 3 is an orthonormal basis, A is an orthogonal 
matrix. We describe A. 


Suppose that a is the angle from the positive x-axis to L. Let v,; = 
(cosa,sina) and vg = (—sina,cosa). Then ||v|| = |lv2|| = 1, v1 € L, 
and vg € L+. Hence y = {v1, v2} is an orthonormal basis for R?. Because 
T(v1) = v1 and T(v2) = —v2, we have 


= Mah =(9 4): 


cosa —sina 
OS Ge ) : 
By the corollary to Theorem 2.23 (p. 115), 


A= Qa 


Let 


384 Chap. 6 Inner Product Spaces 
_ [cosa —sina 1 0 cosa sina 
~ \ sina cosa} \0 —-1 —sina cosa 


cos? a — sin? a 2 sin a cos a 
~ \ 2sinacosa  —(cos* a — sin? a) 


= (te 2a sin i o 


sin2a@ —cos2a 


We know that, for a complex normal [real symmetric] matrix A, there 
exists an orthonormal basis (3 for F” consisting of eigenvectors of A. Hence A 
is similar to a diagonal matrix D. By the corollary to Theorem 2.23 (p. 115), 
the matrix Q whose columns are the vectors in ( is such that D = Q-!AQ. 
But since the columns of Q are an orthonormal basis for F”, it follows that Q 
is unitary [orthogonal]. In this case, we say that A is unitarily equivalent 
[orthogonally equivalent] to D. It is easily seen (see Exercise 18) that this 
relation is an equivalence relation on Mryn(C) [Mnxn(R)]. More generally, 
A and B are unitarily equivalent [orthogonally equivalent] if and only if there 
exists a unitary [orthogonal] matriz P such that A = P* BP. 

The preceding paragraph has proved half of each of the next two theo- 
rems. 


Theorem 6.19. Let A be a complex n x n matrix. Then A is normal if 
and only if A is unitarily equivalent to a diagonal matrix. 


Proof. By the preceding remarks, we need only prove that if A is unitarily 
equivalent to a diagonal matrix, then A is normal. 

Suppose that A = P* DP, where P is a unitary matrix and D is a diagonal 
matrix. Then 


AA* = (P* DP)(P* DP)* = (P* DP)(P* D* P) = P* DID* P = P* DD*P. 


Similarly, A*A = P*D* DP. Since D is a diagonal matrix, however, we have 
DD* = D*D. Thus AA* = A*A. 


Theorem 6.20. Let A be a real n x n matrix. Then A is symmetric if 
and only if A is orthogonally equivalent to a real diagonal matrix. 


Proof. The proof is similar to the proof of Theorem 6.19 and is left as an 


exercise. i 
Example 6 
Let 
4 2 2 
A= |{2 4 2 
22 4 
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Since A is symmetric, Theorem 6.20 tells us that A is orthogonally equivalent 
to a diagonal matrix. We find an orthogonal matrix P and a diagonal matrix 
D such that P'AP = D. 


To find P, we obtain an orthonormal basis of eigenvectors. It is easy to 
show that the eigenvalues of A are 2 and 8. The set {(—1,1,0), (—1,0,1)} 
is a basis for the eigenspace corresponding to 2. Because this set is not 
orthogonal, we apply the Gram-—Schmidt process to obtain the orthogonal 
set {(—1, 1,0), -$(1, 1, -2)}. The set {(1,1,1)} is a basis for the eigenspace 
corresponding to 8. Notice that (1,1,1) is orthogonal to the preceding two 
vectors, as predicted by Theorem 6.15(d) (p. 371). Taking the union of these 
two bases and normalizing the vectors, we obtain the following orthonormal 
basis for R® consisting of eigenvectors of A: 


1 1 
{hho (1,1, =2); gt}. 


Thus one possible choice for P is 


OO 
, and D=(02 0]. 
00 8 


eS st 
Si Sle 
al Si Sle 


Because of Schur’s theorem (Theorem 6.14 p. 370), the next result is 
immediate. As it is the matrix form of Schur’s theorem, we also refer to it as 
Schur’s theorem. 


Theorem 6.21 (Schur). Let A © Mnxn(F) be a matrix whose charac- 
teristic polynomial splits over F. 
(a) If F = C, then A is unitarily equivalent to a complex upper triangular 
matrix. 
(b) If F = R, then A is orthogonally equivalent to a real upper triangular 
matrix. 


Rigid Motions* 


The purpose of this application is to characterize the so-called rigid mo- 
tions of a finite-dimensional real inner product space. One may think intu- 
itively of such a motion as a transformation that does not affect the shape of 
a figure under its action, hence the term rigid. The key requirement for such 
a transformation is that it preserves distances. 


Definition. Let V be a real inner product space. A function f: V > V 
is called a rigid motion if 


IIf(@) — FIT = Ile — yl 
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for allxa,y EV. 


For example, any orthogonal operator on a finite-dimensional real inner 
product space is a rigid motion. 

Another class of rigid motions are the translations. A function g: V — V, 
where V is a real inner product space, is called a translation if there exists 
a vector vo € V such that g(x) = x + vo for all x € V. We say that g is 
the translation by vg. It is a simple exercise to show that translations, as 
well as composites of rigid motions on a real inner product space, are also 
rigid motions. (See Exercise 22.) Thus an orthogonal operator on a finite- 
dimensional real inner product space V followed by a translation on V is a 
rigid motion on V. Remarkably, every rigid motion on V may be characterized 
in this way. 


Theorem 6.22. Let f: V — V be a rigid motion on a finite-dimensional 
real inner product space V. Then there exists a unique orthogonal operator 
T on V and a unique translation g on V such that f= goT. 


Any orthogonal operator is a special case of this composite, in which 
the translation is by 0. Any translation is also a special case, in which the 
orthogonal operator is the identity operator. 


Proof. Let T: V — V be defined by 
T(2) = f(a) — f(0) 


for all « € V. We show that T is an orthogonal operator, from which it 
follows that f = goT, where g is the translation by f(0). Observe that T is 
the composite of f and the translation by —f(0); hence T is a rigid motion. 
Furthermore, for any x € V 


T(x)II? = IF(@) — F(O)I? = Ile — ll? = Ilell?, 
and consequently ||T(2)|| = ||a|| for any « € V. Thus for any x,y € V, 
T(x) — T(y)|? = Ta)? — 2 (T(), TY) + ITIP 
= |lell? — 2 (T(z), T(y)) + Ill? 
and 
ll — yll? = |x|? — 2 (a, y) + llyll?. 


But ||T(x) — T(y)||? = lle — yll?; so (T(x), T(y)) = (a, y) for all x,y € V. 
We are now in a position to show that T is a linear transformation. Let 
z,y € V, and let ae R. Then 


|| T(x + ay) — T(x) — aT(y)|? = ||[T(@ + ay) — T(a)] — aT) |? 
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= ||T(x + ay) — T(2)|/? + a7||T(y) |]? — 2a (T(« + ay) — T(x), T(y)) 
= ||(w + ay) — all? + a? llyl|? — 2a[(T (x + ay), T(y)) — (T(2), Ty))] 
= a? |lyl|? + a? llyll? — 2al(w + ay,y) — (x, y)] 


= 2a? |lyl|? — 2a[(x, y) + allyll? — (x, y)] 
=0; 


Thus T(a+ay) = T(x)+aT(y), and hence T is linear. Since T also preserves 
inner products, T is an orthogonal operator. 

To prove uniqueness, suppose that up and vo are in V and T and U are 
orthogonal operators on V such that 


f(@) = T(x) + uo = U(x) + vo 


for all x € V. Substituting x = 0 in the preceding equation yields ug = v9, 
and hence the translation is unique. This equation, therefore, reduces to 
T(x) = U(z) for all x € V, and hence T = U. | 


Orthogonal Operators on R? 


Because of Theorem 6.22, an understanding of rigid motions requires a 
characterization of orthogonal operators. The next result characterizes or- 
thogonal operators on R?. We postpone the case of orthogonal operators on 
more general spaces to Section 6.11. 


Theorem 6.23. Let T be an orthogonal operator on R?, and let A = [T],, 
where (3 is the standard ordered basis for R?. Then exactly one of the following 
conditions is satisfied: 

(a) T is a rotation, and det(A) = 1. 
(b) T is a reflection about a line through the origin, and det(A) = —1. 


Proof. Because T is an orthogonal operator, T() = {T(e1), T(e2)} is an 
orthonormal basis for R? by Theorem 6.18(c). Since T(e1) is a unit vector, 
there is a unique angle 0, 0 < 6 < 27, such that T(e,) = (cos@,sin@). Since 
T(e2) is a unit vector and is orthogonal to T(e1), there are only two possible 
choices for T(e2). Either 


T(e2) = (—sin6,cos@) or T(e2) = (sin@,—cos6). 


First, suppose that T(e2) = (—sin0,cos@). Then A = ee es A 


sin 6 cos @ 


It follows from Example 1 of Section 6.4 that T is a rotation by the angle 0. 
Also 


det(A) = cos? 6 + sin? @ = 1. 
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Now suppose that T(e2) = (sin@,—cos@). Then A = Ge Bae a) 


sin@ —cos@ 


Comparing this matrix to the matrix A of Example 5, we see that T is the 
reflection of R? about a line L, so that a = 0/2 is the angle from the positive 
z-axis to L. Furthermore, 


det(A) = — cos? 6 — sin? 6 = —1. | 


Combining Theorems 6.22 and 6.23, we obtain the following characteriza- 
tion of rigid motions on R?. 


Corollary. Any rigid motion on R? is either a rotation followed by a trans- 
lation or a reflection about a line through the origin followed by a translation. 


Example 7 
Let 


We show that L, is the reflection of R? about a line L through the origin, and 
then describe L. 


Clearly AA* = A*A = J, and therefore A is an orthogonal matrix. Hence 
Ly is an orthogonal operator. Furthermore, 


and thus Ly is a reflection of R? about a line L through the origin by The- 
orem 6.23. Since L is the one-dimensional eigenspace corresponding to the 
eigenvalue 1 of La, it suffices to find an eigenvector of L4 corresponding to 1. 
One such vector is v = (2, /5 — 1). Thus L is the span of {v}. Alternatively, 
L is the line through the origin with slope (5 — 1)/2, and hence is the line 
with the equation 


pote } 


Conic Sections 


As an application of Theorem 6.20, we consider the quadratic equation 


ax? + Qbry + cy? + dx+ey+ f=0. (2) 
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For special choices of the coefficients in (2), we obtain the various conic 
sections. For example, if a c 1, b d e€ 0, and f = —1, we 
obtain the circle x? + y? = 1 with center at the origin. The remaining 
conic sections, namely, the ellipse, parabola, and hyperbola, are obtained 
by other choices of the coefficients. If b = 0, then it is easy to graph the 
equation by the method of completing the square because the xry-term is 
absent. For example, the equation x? + 2x+y?+4y+2 =0 may be rewritten 
as (x +1)? +(y+2)? =3, which describes a circle with radius V3 and center 
at (—1,—2) in the xy-coordinate system. If we consider the transformation 
of coordinates (x,y) — (a’,y’), where 2’ = «+1 and y/ = y+ 2, then our 
equation simplifies to (x’)? + (y’)? = 3. This change of variable allows us to 
eliminate the x- and y-terms. 

We now concentrate solely on the elimination of the ry-term. To accom- 
plish this, we consider the expression 


ax” + 2bay + cy’, (3) 


which is called the associated quadratic form of (2). Quadratic forms are 
studied in more generality in Section 6.8. 


If we let 
A=(5 and ma) 
boc y 


then (3) may be written as X'AX = (AX,X). For example, the quadratic 
form 3a? + 4ry + 6y? may be written as 


3 2 

t 

xX ¢ 5s) Xx. 

The fact that A is symmetric is crucial in our discussion. For, by Theo- 


rem 6.20, we may choose an orthogonal matrix P and a diagonal matrix D 
with real diagonal entries Ay and Az such that P’AP = D. Now define 


by X’ = P*X or, equivalently, by PX’ = PP*X = X. Then 
X*tAX = (PX')'A(PX’) = X"(PLAP)X! = X" DX! = di (2')? + Ad(y’)?. 


Thus the transformation (x,y) — (2’, y’) allows us to eliminate the xry-term 
in (3), and hence in (2). 

Furthermore, since P is orthogonal, we have by Theorem 6.23 (with T = 
Lp) that det(P) = +1. If det(P) = —1, we may interchange the columns 
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of P to obtain a matrix Q. Because the columns of P form an orthonormal 
basis of eigenvectors of A, the same is true of the columns of Q. Therefore, 


t A2 0 
a'ag=(F y): 
Notice that det(Q) = —det(P) = 1. So, if det(P) = —1, we can take Q for 
our new P; consequently, we may always choose P so that det(P) = 1. By 
Lemma 4 to Theorem 6.22 (with T = Lp), it follows that matrix P represents 
a rotation. 

In summary, the xy-term in (2) may be eliminated by a rotation of the 
x-axis and y-axis to new axes 2’ and y’ given by X = PX’, where P is an 
orthogonal matrix and det(P) = 1. Furthermore, the coefficients of (x’)? and 
(y’)? are the eigenvalues of 

a b 
a-(¢°. 


This result is a restatement of a result known as the principal axis theorem 
for R?. The arguments above, of course, are easily extended to quadratic 
equations in n variables. For example, in the case n = 3, by special choices 
of the coefficients, we obtain the quadratic surfaces—the elliptic cone, the 
ellipsoid, the hyperbolic paraboloid, etc. 

As an illustration of the preceding transformation, consider the quadratic 
equation 


2a? — dry + Sy” — 36 = 0, 


for which the associated quadratic form is 27? — 4ry + 5y?. In the notation 
we have been using, 
2 -2 
4=(4 4): 
so that the eigenvalues of A are 1 and 6 with associated eigenvectors 


G) at (3). 


As expected (from Theorem 6.15(d) p. 371), these vectors are orthogonal. 
The corresponding orthonormal basis of eigenvectors 


1 
Sl- Sls 
SleSlt 
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determines new axes x’ and y’ as in Figure 6.4. Hence if 


= 
Pig): 
V5 


then 


Under the transformation X = PX’ or 


we 
S..——: S S 

ve V5. 

ae e 2: 

Yr ZL ey: 

ve V5 
we have the new quadratic form (2’)? + 6(y’)?.. Thus the original equation 
2x? —4ay+5y? = 36 may be written in the form (x’)?+6(y’)? = 36 relative to 
anew coordinate system with the 2’- and y’-axes in the directions of the first 
and second vectors of 3, respectively. It is clear that this equation represents 


/ 
x 


Ya 
as 
YN 
\ 
\ ! 
\ av 
\ ae 
2 
2 
\ aca 
ee x 
A as 
a” \ 
ae \ 
2 
+ 
es \ 
- \ 
\ 
\ 
\ 
Figure 6.4 


an ellipse. (See Figure 6.4.) Note that the preceding matrix P has the form 
cos@ —sin@ 
sin@  cos@)}’ 


2 
7 oe = 26.6°. So P is the matrix representation of a rotation 


where 6 = cos— 


of R? through the angle 6. Thus the change of variable X = PX’ can be ac- 
complished by this rotation of the x- and y-axes. There is another possibility 
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for P, however. If the eigenvector of A corresponding to the eigenvalue 6 is 
taken to be (1,—2) instead of (—1,2), and the eigenvalues are interchanged, 
then we obtain the matrix 


1 2 
v5 
1 
v5 
which is the matrix representation of a rotation through the angle 6 = 


2 
sin7? FE x —63.4°. This possibility produces the same ellipse as the 


al sal 


one in Figure 6.4, but interchanges the names of the 2’- and y’-axes. 


EXERCISES 


1. Label the following statements as true or false. Assume that the under- 
lying inner product spaces are finite-dimensional. 


(a) Every unitary operator is normal. 
(b) Every orthogonal operator is diagonalizable. 
(c) A matrix is unitary if and only if it is invertible. 
(d) If two matrices are unitarily equivalent, then they are also similar. 
(e) The sum of unitary matrices is unitary. 
(f) The adjoint of a unitary operator is unitary. 
g) If T is an orthogonal operator on V, then [T]g is an orthogonal 
gs B g 
matrix for any ordered basis (@ for V. 
(h) If all the eigenvalues of a linear operator are 1, then the operator 
must be unitary or orthogonal. 
i) A linear operator may preserve the norm, but not the inner prod- 
Pp YP ) p 
uct. 


2. For each of the following matrices A, find an orthogonal or unitary 
matrix P and a diagonal matrix D such that P* AP = D. 


(a) € : (b) ( =) (c) Ce ) 
0 2 2 2s Als. 7 

(d){2 0 2} (e){1 21 
22; 1 1 2 


3. Prove that the composite of unitary [orthogonal] operators is unitary 
[orthogonal]. 
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4. 


10. 


For z € C, define T,: C > C by T,(u) = zu. Characterize those z for 
which T, is normal, self-adjoint, or unitary. 


Which of the following pairs of matrices are unitarily equivalent? 


(a) G i) aad G i) (b) (; and 


NIF © 
oO NF 


£30 2 0 

(c) {-1 0 0] and [0 -1 0 

01 0 0 

1 0 tir 10 

(d) {-1 0 0] and [0 7 O 

0 1 00 -i 
110 1 0 0 
(e) [0 2 2] and [0 2 0 
0 3 0 0 3 


Let V be the inner product space of complex-valued continuous func- 
tions on [0,1] with the inner product 


(f.9) = | H(tg@ at. 


Let h € V, and define T: V > V by T(f) = Af. Prove that T is a 
unitary operator if and only if |A(t)| =1 forO<t< 1. 


Prove that if T is a unitary operator on a finite-dimensional inner prod- 
uct space V, then T has a unitary square root; that is, there exists a 
unitary operator U such that T = U?. 


Let T be a self-adjoint linear operator on a finite-dimensional inner 
product space. Prove that (T+il)(T—il)~+ is unitary using Exercise 10 
of Section 6.4. 


Let U be a linear operator on a finite-dimensional inner product space 
V. If |[U(x)|| = |||] for all 2 in some orthonormal basis for V, must U 
be unitary? Justify your answer with a proof or a counterexample. 


Let A be an n x n real symmetric or complex normal matrix. Prove 
that 


tr(A)=S >A; and tr(A*A) = S° Ai), 
i=l i=1 


where the \,’s are the (not necessarily distinct) eigenvalues of A. 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 
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Find an orthogonal matrix whose first row is (3, 3, 3). 


Let A be an n x n real symmetric or complex normal matrix. Prove 
that 


n 


det(A) = |] X, 


i=1 
where the \,’s are the (not necessarily distinct) eigenvalues of A. 


Suppose that A and B are diagonalizable matrices. Prove or disprove 
that A is similar to B if and only if A and B are unitarily equivalent. 


Prove that if A and B are unitarily equivalent matrices, then A is pos- 
itive definite [semidefinite] if and only if B is positive definite [semidef- 
inite]. (See the definitions in the exercises in Section 6.4.) 


Let U be a unitary operator on an inner product space V, and let W be 
a finite-dimensional U-invariant subspace of V. Prove that 

(a) U(W) = W; 

(b) W+ is U-invariant. 

Contrast (b) with Exercise 16. 


Find an example of a unitary operator U on an inner product space and 
a U-invariant subspace W such that W+ is not U-invariant. 


Prove that a matrix that is both unitary and upper triangular must be 
a diagonal matrix. 


Show that “is unitarily equivalent to” is an equivalence relation on 
Mnxn(C). 


Let W be a finite-dimensional subspace of an inner product space V. 
By Theorem 6.7 (p. 352) and the exercises of Section 1.3, V= W@W-. 
Define U: V > V by U(v1 + v2) = v1 — v2, where v1 € W and ve € W~. 
Prove that U is a self-adjoint unitary operator. 


Let V be a finite-dimensional inner product space. A linear operator U 
on V is called a partial isometry if there exists a subspace W of V 
such that ||U(z)|| = ||z|| for all 2 € W and U(x) = 0 for all z € Wt. 
Observe that W need not be U-invariant. Suppose that U is such an 
operator and {v1,v2,...,v%} is an orthonormal basis for W. Prove the 
following results. 


(a) (U(x),U(y)) = (2, y) for all 2,y © W. Hint: Use Exercise 20 of 
Section 6.1. 
(b) {U(v1), U(v2),... ,U(v~)} is an orthonormal basis for R(U). 


Sec. 6.5 Unitary and Orthogonal Operators and Their Matrices 395 


21. 


22. 


23. 


24. 


(c) There exists an orthonormal basis y for V such that the first 
k columns of [U], form an orthonormal set and the remaining 
columns are zero. 

(d) Let {w ,we,...,w;} be an orthonormal basis for R(U)+ and 8 = 
{U(v1), U(ve),..., U(ve), wi,...,wz}. Then @ is an orthonormal 
basis for V. 

(e) Let T be the linear operator on V that satisfies T(U(v,;)) = v; 
(1 <i<k) and T(w;) = 0 (1 <i < 7). Then T is well defined, 
and T = U*. Hint: Show that (U(x), y) = (x, T(y)) for all zy € 2. 
There are four cases. 

(f) U* is a partial isometry. 


This exercise is continued in Exercise 9 of Section 6.6. 


Let A and B be n x n matrices that are unitarily equivalent. 


(a) Prove that tr(A*A) = tr(B*B). 
(b) Use (a) to prove that 


S> Ag? = 55 [Biyl?. 


ij=l ij=l 


(c) Use (b) to show that the matrices 


Gy ee Aa 


are not unitarily equivalent. 


Let V be a real inner product space. 


(a) Prove that any translation on V is a rigid motion. 
(b) Prove that the composite of any two rigid motions on V is a rigid 
motion on V. 


Prove the following variation of Theorem 6.22: If f: V — V is a rigid 
motion on a finite-dimensional real inner product space V, then there 
exists a unique orthogonal operator T on V and a unique translation g 
on V such that f = Tog. 


Let T and U be orthogonal operators on R?. Use Theorem 6.23 to prove 
the following results. 


(a) If T and U are both reflections about lines through the origin, then 
UT is a rotation. 

(b) If T is a rotation and U is a reflection about a line through the 
origin, then both UT and TU are reflections about lines through 
the origin. 
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26. 


27. 


28. 


29. 
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Suppose that T and U are reflections of R? about the respective lines 
L and L’ through the origin and that ¢ and w are the angles from 
the positive x-axis to L and L’, respectively. By Exercise 24, UT is a 
rotation. Find its angle of rotation. 


Suppose that T and U are orthogonal operators on R? such that T is 
the rotation by the angle ¢ and U is the reflection about the line L 
through the origin. Let ~ be the angle from the positive x-axis to L. 
By Exercise 24, both UT and TU are reflections about lines Ly and Lo, 
respectively, through the origin. 

(a) Find the angle @ from the positive x-axis to Ly. 

(b) Find the angle @ from the positive x-axis to Lo. 


Find new coordinates x’, y’ so that the following quadratic forms can 
be written as \1(a’)? + Aa(y’)?. 

(a) 27 +42y+y? 

(b) 2x? 4 2ry + 2y? 

(c) 2? —122y — 4y? 

(d) 32? + 2ry + 3y? 

(e) x? —2Qay+y? 


Consider the expression X'AX, where X! = (a, y, z) and A is as defined 
in Exercise 2(e). Find a change of coordinates 2’, y’,z’ so that the 
preceding expression is of the form Ay(x’)? + As(y’)? + A3(z’)?. 


QR-Factorization. Let wi, w2,...,Wn be linearly independent vectors 
in F”, and let v1, v2,...,U, be the orthogonal vectors obtained from 
W 1, W2,---,Wn by the Gram—Schmidt process. Let uj,, ug,...,Un be the 
orthonormal basis obtained by normalizing the 1,’s. 


(a) Solving (1) in Section 6.2 for w; in terms of uz, show that 
k-1 


we = [leellue + Y (wes uy) uy (LS RS). 
j=l 


(b) Let A and Q denote the n x n matrices in which the kth columns 
are wz and ug, respectively. Define R © Mnxn(F’) by 


lll ifg=k 
Ryx = (We, U;) ifj7<k 
0 if j>k. 


Prove A= QR. 
(c) Compute Q and R as in (b) for the 3 x3 matrix whose columns are 
the vectors w 1, W2, w3, respectively, in Example 4 of Section 6.2. 
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30. 


(d) Since Q is unitary [orthogonal] and R is upper triangular in (b), 
we have shown that every invertible matrix is the product of a uni- 
tary [orthogonal] matrix and an upper triangular matrix. Suppose 
that A € Mayn(F) is invertible and A = Q,R, = Q2R2, where 
Q1,Q2 © Maxn(F) are unitary and Ri, Ro € Mnxn(F’) are upper 
triangular. Prove that D = R2R{" is a unitary diagonal matrix. 
Hint: Use Exercise 17. 

(e) The QR factorization described in (b) provides an orthogonaliza- 
tion method for solving a linear system Ax = 6b when A is in- 
vertible. Decompose A to QR, by the Gram—Schmidt process or 
other means, where Q is unitary and R is upper triangular. Then 
QRax = b, and hence Rx = Q*b. This last system can be easily 
solved since R is upper triangular. ! 

Use the orthogonalization method and (c) to solve the system 


t+ 229 +r 223 = “1 
Ly ae 223 = 11 
27 ©3>= 1. 


Suppose that @ and ¥ are ordered bases for an n-dimensional real [com- 
plex] inner product space V. Prove that if Q is an orthogonal [unitary] 
n X n matrix that changes y-coordinates into G-coordinates, then ( is 
orthonormal if and only if 7 is orthonormal. 


The following definition is used in Exercises 31 and 32. 


Definition. Let V be a finite-dimensional complex [real] inner product 


space, and let u be a unit vector in V. Define the Householder operator 
Hi: VV by Hy(a) = 2% —2(x,u)u for alla eV. 


31. 


Let H.,, be a Householder operator on a finite-dimensional inner product 

space V. Prove the following results. 

(a) le is linear. 

(b) H.(2) = if and only if x is orthogonal to u. 

(c) Hu(u) = — 

(d) Hy = Hu aa H?2 = |, and hence H, is a unitary [orthogonal] 
operator on V. 


(Note: If V is a real inner product space, then in the language of Sec- 
tion 6.11, H,, is a reflection.) 


TAt one time, because of its great stability, this method for solving large sys- 
tems of linear equations with a computer was being advocated as a better method 
than Gaussian elimination even though it requires about three times as much work. 
(Later, however, J. H. Wilkinson showed that if Gaussian elimination is done “prop- 
erly,” then it is nearly as stable as the orthogonalization method.) 
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32. Let V be a finite-dimensional inner product space over F’. Let x and y 
be linearly independent vectors in V such that ||2|| = ||y|I. 
(a) If F =C, prove that there exists a unit vector u in V and a complex 
number @ with |6| = 1 such that H ’ ) = Oy. Hint: Choose @ so 
(x — Oy). 
lz — Oy] = ull 


(b) If F = R, prove that there exists a unit vector u in V such that 
Hu(z) = y- 


that (x, @y) is real, and set u = 


6.6 ORTHOGONAL PROJECTIONS 
AND THE SPECTRAL THEOREM 


In this section, we rely heavily on Theorems 6.16 (p. 372) and 6.17 (p. 374) to 
develop an elegant representation of a normal (if F = C) or a self-adjoint (if 
F = R) operator T ona finite-dimensional inner product space. We prove that 
T can be written in the form A,T, + AgT2+---+AxnTe, where A1, A2,..-, Ax 
are the distinct eigenvalues of T and Ty, T2,..., Tx are orthogonal projections. 
We must first develop some results about these special projections. 

We assume that the reader is familiar with the results about direct sums 
developed at the end of Section 5.2. The special case where V is a direct sum 
of two subspaces is considered in the exercises of Section 1.3. 

Recall from the exercises of Section 2.1 that if V = W,@Wa, then a linear 
operator T on V is the projection on W, along W> if, whenever x = 71+ 22, 
with 2, € W, and x2 € Wo, we have T(x) = 21. By Exercise 26 of Section 2.1, 
we have 


R(T) =W, ={# EV: T(z)=a2} and N(T)=We. 


So V = R(T) ®@ N(T). Thus there is no ambiguity if we refer to T as a 
“projection on W,” or simply as a “projection.” In fact, it can be shown 
(see Exercise 17 of Section 2.3) that T is a projection if and only if T = T?. 
Because V = W,@W2 = W,; @Ws; does not imply that Wz = Ws, we see that 
W, does not uniquely determine T. For an orthogonal projection T, however, 
T is uniquely determined by its range. 


Definition. Let V be an inner product space, and let T: V — V bea 
projection. We say that T is an orthogonal projection if R(T)+ = N(T) 
and N(T)+ = R(T). 


Note that by Exercise 13(c) of Section 6.2, if V is finite-dimensional, we 
need only assume that one of the preceding conditions holds. For example, if 
R(T)+ = N(T), then R(T) = R(T)++ = N(T)+. 

Now assume that W is a finite-dimensional subspace of an inner product 
space V. In the notation of Theorem 6.6 (p. 350), we can define a function 
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T: V—V by T(y) = u. It is easy to show that T is an orthogonal projection 
on W. We can say even more—there exists exactly one orthogonal projection 
on W. For if T and U are orthogonal projections on W, then R(T) = W = 
R(U). Hence N(T) = R(T)+ = R(U)+ = N(U), and since every projection is 
uniquely determined by its range and null space, we have T = U. We call T 
the orthogonal projection of V on W. 

To understand the geometric difference between an arbitrary projection 
on W and the orthogonal projection on W, let V = R? and W = span{(1, 1)}. 
Define U and T as in Figure 6.5, where T(v) is the foot of a perpendicular 
from v on the line y = x and U(a1,a2) = (a1,a1). Then T is the orthogo- 
nal projection of V on W, and U is a different projection on W. Note that 
v—T(v) € Wt, whereas v — U(v) ¢ Wt. 


Figure 6.5 


From Figure 6.5, we see that T(v) is the “best approximation in W to v”; 
that is, if w € W, then ||w — v|| > |/T(v) — vl. In fact, this approximation 
property characterizes T. These results follow immediately from the corollary 
to Theorem 6.6 (p. 350). 

As an application to Fourier analysis, recall the inner product space H and 
the orthonormal set S in Example 9 of Section 6.1. Define a trigonometric 
polynomial of degree n to be a function g € H of the form 


n 


g(t) = S° as f(t) = D> aje™, 


j=—n j=—n 


where a, OF A_y is nonzero. 
Let f € H. We show that the best approximation to f by a trigonometric 
polynomial of degree less than or equal to n is the trigonometric polynomial 
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whose coefficients are the Fourier coefficients of f relative to the orthonormal 
set S. For this result, let W = span({f;: |j| <_n}), and let T be the orthogo- 
nal projection of H on W. The corollary to Theorem 6.6 (p. 350) tells us that 
the best approximation to f by a function in W is 


n 


T= >, Gant: 


j=—n 
An algebraic characterization of orthogonal projections follows in the next 


theorem. 


Theorem 6.24. Let V be an inner product space, and let T be a linear 
operator on V. Then T is an orthogonal projection if and only if T has an 
adjoint T* and 1? = T= 1"; 


Proof. Suppose that T is an orthogonal projection. Since T? = T because 
T is a projection, we need only show that T* exists and T = T*. Now 
V = R(T) @ N(T) and R(T)+ = N(T). Let z,y € V. Then we can write 
©= 24, +X and y= y1 + yo, where 71, yi € R(T) and x2, ye € N(T). Hence 


(x, T(y)) = (a1 + ©2, 91) = (@1, 91) + (22,41) = (21, 91) 


and 


(T(x), y) = (21, y1 + Yo) = (@1, yi) + (@1, Yo) = (@1, 1) - 


So (x, T(y)) = (T(«), y) for all z,y € V; thus T* exists and T = T*. 

Now suppose that T? = T = T*. Then T is a projection by Exercise 17 of 
Section 2.3, and hence we must show that R(T) = N(T)+ and R(T)+ = N(T). 
Let x € R(T) and y € N(T). Then x = T(x) = T*(a), and so 


(x,y) = (T*(x),y) = (a, T(y)) = (a, 0) = 0. 
Therefore « € N(T)+, from which it follows that R(T) C N(T)+. 
Let y € N(T)+. We must show that y € R(T), that is, T(y) = y. Now 
lly - TH)? = w- TY), ¥- TH) 
= (y,y— T(y)) — (Ty), 9 — Ty) - 


Since y — T(y) € N(T), the first term must equal zero. But also 


(T(y),y — T(y)) = (v, T’(y — T(y))) = (y, Ty — T(y))) = (y, 0) = 0. 


Thus y — T(y) = 0; that is, y= T(y) € R(T). Hence R(T) = N(T)+. 

Using the preceding results, we have R(T)+ = N(T)++ D N(T) by Exer- 
cise 13(b) of Section 6.2. Now suppose that x € R(T)+. For any y € V, we 
have (T(x), y) = (x, T*(y)) = (a, T(y)) = 0. So T(x) = 0, and thus x € N(T). 
Hence R(T)+ = N(T). 
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Let V be a finite-dimensional inner product space, W be a subspace of V, 
and T be the orthogonal projection of Von W. We may choose an orthonormal 
basis @ = {v1,v2,..-,Un} for V such that {v1,v2,...,vz~} is a basis for W. 
Then [T]g is a diagonal matrix with ones as the first k diagonal entries and 
zeros elsewhere. In fact, [T]g has the form 


Ti, O71 
Oz O3/]° 
If U is any projection on W, we may choose a basis y for V such that [U]., has 


the form above; however 7¥ is not necessarily orthonormal. 
We are now ready for the principal theorem of this section. 


Theorem 6.25 (The Spectral Theorem). Suppose that T is a linear 
operator on a finite-dimensional inner product space V over F' with the dis- 
tinct eigenvalues A, A2,.-..,A~. Assume that T is normal if F = C and that 
T is self-adjoint if F = R. For eachi (1 <i<k), let W; be the eigenspace of 
T corresponding to the eigenvalue \;, and let T; be the orthogonal projection 
of V on W;. Then the following statements are true. 

(a) V=HW,O0W264--:- Ws. 

(b) If Wi denotes the direct sum of the subspaces W, for 7 # i, then 
Wi = Wi. 

(c) TiT; = big Vi for 1 < 4,9 < k. 

(d) l= Ty, +Ta2+--: +Tx. 

(e) T=AiT1 +AgT2 +++: +ARTr.- 


Proof. (a) By Theorems 6.16 (p. 372) and 6.17 (p. 374), T is diagonalizable; 
so 


V=W,6W29e::--OW, 


by Theorem 5.11 (p. 278). 

(b) If « € W; and y € W, for some i # j, then (x,y) = 0 by The- 
orem 6.15(d) (p. 371). It follows easily from this result that Wi, C W?}. 
From (a), we have 


dim(W;) = 5° dim(W,) = dim(V) — dim(W,). 
j#i 


On the other hand, we have dim(W;) = dim(V) —dim(W;) by Theorem 6.7(c) 
(p. 352). Hence Wi = W;,, proving (b). 

(c) The proof of (c) is left as an exercise. 

(d) Since T; is the orthogonal projection of V on W,, it follows from 
(b) that N(T;) = R(T,;)+ = Wt = Wi. Hence, for x € V, we have x = 
Uy +a. +-+-+ 2x4, where T;(x) = x; € Wi, proving (d). 
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(e) For « € V, write = 2, + 42+---+ 24, where x; € W;. Then 


T(x) = T(a1) + T(t2) +--+ + T(x) 
= yt, + Agto +--+ +ARTE 
= A,T1(a@) + AgTa(a) ++ HART, (2) 
= (AT + AgT2 +++: + ARTE) (2). | 


The set {Ai, A2,---, Ax} of eigenvalues of T is called the spectrum of T, 
the sum | = Tj +T2+---+Ty, in (d) is called the resolution of the identity 
operator induced by T, and the sum T = \yT1 + AgT2 +---+AxTs in (e) 
is called the spectral decomposition of T. The spectral decomposition of 
T is unique up to the order of its eigenvalues. 

With the preceding notation, let @ be the union of orthonormal bases of 
the W,’s and let m; = dim(W;). (Thus m, is the multiplicity of A;.) Then 
[T]¢ has the form 


Lie 70 O 
OO Delos O 
O OF Geko, 


that is, [T]g is a diagonal matrix in which the diagonal entries are the eigen- 
values A; of T, and each ; is repeated m, times. If A7T, +AgT2+---+ARTR 
is the spectral decomposition of T, then it follows (from Exercise 7) that 
G(T) = g(A1)T1 + g(A2)T2 +++: + 9(Ag) Tx for any polynomial g. This fact is 
used below. 

We now list several interesting corollaries of the spectral theorem; many 
more results are found in the exercises. For what follows, we assume that T 
is a linear operator on a finite-dimensional inner product space V over F’. 


Corollary 1. If F = C, then T is normal if and only if T* = g(T) for 
some polynomial g. 


Proof. Suppose first that T is normal. Let T = A,T, + AgTo+---+AnT: 
be the spectral decomposition of T. Taking the adjoint of both sides of the 
preceding equation, we have T* = \,T,; + AgT2 +--+ AxnTs since each T; is 
self-adjoint. Using the Lagrange interpolation formula (see page 52), we may 
choose a polynomial g such that g(A;) = A; for 1 <i<k. Then 


g(T) = g(Mi1)T1 + g(A2)T2 Bs eee g(Ak) Te =\T1 + doT2 free Hf ApTR=IT". 


Conversely, if T* = g(T) for some polynomial g, then T commutes with 
T* since T commutes with every polynomial in T. So T is normal. 
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Corollary 2. If F = C, then T is unitary if and only if T is normal and 
|\| = 1 for every eigenvalue > of T. 


Proof. If T is unitary, then T is normal and every eigenvalue of T has 
absolute value 1 by Corollary 2 to Theorem 6.18 (p. 382). 

Let T = A,T, +AgT2 +---+AxT, be the spectral decomposition of T. If 
|\| = 1 for every eigenvalue » of T, then by (c) of the spectral theorem, 


TE = (ATi + AgToa+---+ ArTR)(A1T1 + oT 2 tose ArT) 
= |Ai|?T1 + |A2|?T2 tose |An |? Tp 
=T,+Te+---+Ts 


Hence T is unitary. | 


Corollary 3. If F = C and T is normal, then T is self-adjoint if and 
only if every eigenvalue of T is real. 


Proof. Let T = A171 + AoT2 +--- +AxT, be the spectral decomposition 
of T. Suppose that every eigenvalue of T is real. Then 


T* = NT + Aga +++) ARTE = ATi + AaT2 +++ + ARTE = T. 


The converse has been proved in the lemma to Theorem 6.17 (p. 374). 


Corollary 4. Let T be as in the spectral theorem with spectral decom- 
position T = \,T, + AgT2+-+-+AxTx. Then each T; is a polynomial in 
T. 


Proof. Choose a polynomial g; (1 < j < k) such that g;(A;) = 6;;. Then 


93(T) = gy (As) T1 + 9j(A2)T2 +++ + 95(Ag) Te 
= 615171 + 62; T2 +---+ dn; Tr = Ty. | 


EXERCISES 


1. Label the following statements as true or false. Assume that the under- 
lying inner product spaces are finite-dimensional. 
(a) All projections are self-adjoint. 
(b) An orthogonal projection is uniquely determined by its range. 
(c) Every self-adjoint operator is a linear combination of orthogonal 
projections. 
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(d) If T is a projection on W, then T() is the vector in W that is 
closest to x. 
(e) Every orthogonal projection is a unitary operator. 


Let V = R?, W = span({(1,2)}), and @ be the standard ordered basis 
for V. Compute [T]g, where T is the orthogonal projection of V on W. 
Do the same for V = R? and W = span({(1,0, 1)}). 


For each of the matrices A in Exercise 2 of Section 6.5: 


(1) Verify that L4 possesses a spectral decomposition. 

(2) For each eigenvalue of L4, explicitly define the orthogonal projec- 
tion on the corresponding eigenspace. 

(3) Verify your results using the spectral theorem. 


Let W be a finite-dimensional subspace of an inner product space V. 
Show that if T is the orthogonal projection of V on W, then |—T is the 
orthogonal projection of V on Wt. 


Let T be a linear operator on a finite-dimensional inner product space 
V. 


(a) If T is an orthogonal projection, prove that ||T(«)|| < ||a|| for all 
x € V. Give an example of a projection for which this inequality 
does not hold. What can be concluded about a projection for 
which the inequality is actually an equality for all x € V? 

(b) Suppose that T is a projection such that ||T(2)|| < ||z|| for a € V. 
Prove that T is an orthogonal projection. 


Let T be a normal operator on a finite-dimensional inner product space. 
Prove that if T is a projection, then T is also an orthogonal projection. 


Let T be a normal operator on a finite-dimensional complex inner prod- 
uct space V. Use the spectral decomposition 417, + A2T2+---+AgTE 
of T to prove the following results. 


(a) If g is a polynomial, then 


(b) If T” = To for some n, then T = To. 

(c) Let U be a linear operator on V. Then U commutes with T if and 
only if U commutes with each T;. 

(d) There exists a normal operator U on V such that U? = T. 

(e) T is invertible if and only if A; 40 forl<i<k. 

(f) T is a projection if and only if every eigenvalue of T is 1 or 0. 
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(g) T =—T* if and only if every \; is an imaginary number. 


8. Use Corollary 1 of the spectral theorem to show that if T is a normal 
operator on a complex finite-dimensional inner product space and U is 
a linear operator that commutes with T, then U commutes with T*. 


9. Referring to Exercise 20 of Section 6.5, prove the following facts about 
a partial isometry U. 


(a) U*U is an orthogonal projection on W. 
(b) UU*U=U. 


10. Simultaneous diagonalization. Let U and T be normal operators on a 
finite-dimensional complex inner product space V such that TU = UT. 
Prove that there exists an orthonormal basis for V consisting of vectors 
that are eigenvectors of both T and U. Hint: Use the hint of Exercise 14 
of Section 6.4 along with Exercise 8. 


11. Prove (c) of the spectral theorem. 


6.7* THE SINGULAR VALUE DECOMPOSITION 
AND THE PSEUDOINVERSE 


In Section 6.4, we characterized normal operators on complex spaces and self- 
adjoint operators on real spaces in terms of orthonormal bases of eigenvectors 
and their corresponding eigenvalues (Theorems 6.16, p. 372, and 6.17, p. 374). 
In this section, we establish a comparable theorem whose scope is the entire 
class of linear transformations on both complex and real finite-dimensional 
inner product spaces—the singular value theorem for linear transformations 
(Theorem 6.26). There are similarities and differences among these theorems. 
All rely on the use of orthonormal bases and numerical invariants. However, 
because of its general scope, the singular value theorem is concerned with 
two (usually distinct) inner product spaces and with two (usually distinct) 
orthonormal bases. If the two spaces and the two bases are identical, then the 
transformation would, in fact, be a normal or self-adjoint operator. Another 
difference is that the numerical invariants in the singular value theorem, the 
singular values, are nonnegative, in contrast to their counterparts, the eigen- 
values, for which there is no such restriction. This property is necessary to 
guarantee the uniqueness of singular values. 

The singular value theorem encompasses both real and complex spaces. 
For brevity, in this section we use the terms unitary operator and unitary 
matrix to include orthogonal operators and orthogonal matrices in the context 
of real spaces. Thus any operator T for which (T(x), T(y)) = (x,y), or any 
matrix A for which (Az, Ay) = (x,y), for all x and y is called unitary for the 
purposes of this section. 
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In Exercise 15 of Section 6.3, the definition of the adjoint of an operator 
is extended to any linear transformation T: V — W, where V and W are 
finite-dimensional inner product spaces. By this exercise, the adjoint T* of 
T is a linear transformation from W to V and [T*]8 = ([T]3)*, where ( and 
y are orthonormal bases for V and W, respectively. Furthermore, the linear 
operator T*T on V is positive semidefinite and rank(T*T) = rank(T) by 
Exercise 18 of Section 6.4. 

With these facts in mind, we begin with the principal result. 


Theorem 6.26 (Singular Value Theorem for Linear Transforma- 
tions). Let V and W be finite-dimensional inner product spaces, and let 
T: VW be alinear transformation of rank r. Then there exist orthonormal 


bases {v1,V2,...,Un} for V and {u1, u2,...,Um} for W and positive scalars 
01 >02>°-:>0, such that 
oju; ifl<i<r 
T(vi) = ne (4) 
0 ifi>r. 


Conversely, suppose that the preceding conditions are satisfied. Then for 
1 <i<n, y; is an eigenvector of T*T with corresponding eigenvalue o? if 
1<i<prand0Oifi>r. Therefore the scalars 01,02,...,0, are uniquely 
determined by T. 


Proof. We first establish the existence of the bases and scalars. By Ex- 
ercises 18 of Section 6.4 and 15(d) of Section 6.3, T*T is a positive semidef- 
inite linear operator of rank r on V; hence there is an orthonormal basis 
{v1,v2,...,Un} for V consisting of eigenvectors of T*T with corresponding 
eigenvalues ;, where Ay; > Ag > -:: > Ay > O, and A; = 0 fori > r. For 


1 
1<i<r, define o; = VA; and u; = —T(v;). We show that {u,u2,...,u,} 
O% 


is an orthonormal subset of W. Suppose 1 < 2,7 <r. Then 


(us uj) = (2 700 1 T)) 


II 
2 
S ] eS 
Ss 
— ~~ 
4 
* 
= 
= 
& 
—_—_L-—~ 
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and hence {u1,ug,...,u,} is orthonormal. By Theorem 6.7(a) (p. 352), this 
set extends to an orthonormal basis {u1, u2,...,Ur,---;Um} for W. Clearly 
T(u;) = o7u; if 1 <i<r. Ift>r, then T*T(vu,) = 0, and so T(v;) = 0 by 
Exercise 15(d) of Section 6.3. 

To establish uniqueness, suppose that {v1,v2,...,Un}, {u1,U2,---;Um}, 
and 0; > 02 >-:: >a, > 0 satisfy the properties stated in the first part of 
the theorem. Then for 1 <i<mand1l<j<n, 


(T* (ui), vj) = (us, T(v;)) 


o, ifi=je<r 
0 otherwise, 


and hence for any 1 <i<™m, 


ow, fi=je<r 
Te otherwise. 


So fori <r, 
T*T(v;) = T*(oiuj) = of T* (ui) = oF Uj 


and T*T(v;) = T*(0) = 0 fori > r. Therefore each v; is an eigenvector of 
T*T with corresponding eigenvalue o? if i <r and 0 ifi>r. 


Definition. The unique scalars o,,02,...,0, in Theorem 6.26 are called 
the singular values of T. If r is less than both m and n, then the term 
singular value is extended to include 0,41 = --- = ox, = 0, where k is the 


minimum of m and n. 


Although the singular values of a linear transformation T are uniquely de- 
termined by T, the orthonormal bases given in the statement of Theorem 6.26 
are not uniquely determined because there is more than one orthonormal basis 
of eigenvectors of T*T. 

In view of (5), the singular values of a linear transformation T: V — W 
and its adjoint T* are identical. Furthermore, the orthonormal bases for V 
and W given in Theorem 6.26 are simply reversed for T*. 


Example 1 


Let P2(R) and P,(R) be the polynomial spaces with inner products defined 
by 


(f(a), 9(2)) = / HO a( 0 at 
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Let T: Po(R) — P,(R) be the linear transformation defined by T(f(x)) = 
f’(x). Find orthonormal bases 3 = {v1, v2, v3} for Po(R) and y = {u1, ug} for 
Pi(R) such that T(v;) = oju; for i = 1,2 and T(vs) = 0, where o1 > a2 > 0 
are the nonzero singular values of T. 


To facilitate the computations, we translate this problem into the corre- 
sponding problem for a matrix representation of T. Caution is advised here 
because not any matrix representation will do. Since the adjoint is defined 
in terms of inner products, we must use a matrix representation constructed 
from orthonormal bases for P2() and P;(R) to guarantee that the adjoint 
of the matrix representation of T is the same as the matrix representation of 
the adjoint of T. (See Exercise 15 of Section 6.3.) For this purpose, we use 
the results of Exercise 21(a) of Section 6.2 to obtain orthonormal bases 


fa Vievinep me (eV 


for P2(R) and P;(R), respectively. 
Let 


Then 


0 0 00 0 
AA=1 V3 0 (> vgl=(° 3 0}, 
0 V15 0 0 15 


which has eigenvalues (listed in descending order of size) 43 = 15, A2 = 3, 
and A3 = 0. These eigenvalues correspond, respectively, to the orthonormal 
eigenvectors e3 = (0,0,1), e2 = (0,1,0), and e; = (1,0,0) in R®. Translating 
everything into the context of T, P2(), and P1(R), let 


1 
n= 2 @x*—», mapa and caer 


Then 3 = {v1,v2,v3} is an orthonormal basis for P2(R) consisting of eigen- 
vectors of T*T with corresponding eigenvalues \;, A2, and A3. Now set 
0, = VA\y = V15 and og = VA2g = V3, the nonzero singular values of T, 
and take 


1 3 
uy = pon = YEE and wu z= Gy | (2) = 


to obtain the required basis y = {u1,u2} for Pi(R). ¢ 
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We can use singular values to describe how a figure is distorted by a linear 
transformation. This is illustrated in the next example. 


Example 2 


Let T be an invertible linear operator on R? and S$ = {x € R?: ||a|| = 1}, the 
unit circle in R?. We apply Theorem 6.26 to describe $’ = T(S). 
Since T is invertible, it has rank equal to 2 and hence has singular values 
01 > 02 > 0. Let {v1,v2} and B = {u,,u2} be orthonormal bases for R? so 
that T(v1) = o1u1 and T(v2) = cue, as in Theorem 6.26. Then (3 determines 
a coordinate system, which we shall call the «’y’'-coordinate system for R?, 
where the «’-axis contains u; and the y’-axis contains uz. For any vector 
gl 
u € R?’, if w= rut xhua, then [u]g = ( ') is the coordinate vector of u 
U9 


relative to 3. We characterize 5” in terms of an equation relating x and x}. 


For any vector v = 7101 + r2V2 € R?, the equation u = T(v) means that 
U= T(a1v1 + XQV2) — x17 (v1) + xT (v2) = 14101U1 + L202Ug. 


Thus for u = vu1 + vgue, we have x = x10) and x4 = xg02. Furthermore, 
u € S’ if and only if uv € S if and only if 


(x4)? (x5)? a ee ee 
Qe EE oe ale 
O71 2 


If 01 = 02, this is the equation of a circle of radius 01, and if 0, > 02, this is 
the equation of an ellipse with major axis and minor axis oriented along the 
x’-axis and the y’-axis, respectively. (See Figure 6.6.) 


Ss’ 


PSs 
| 


V=2X1U1 + ©2V2 02 


u= vu + £gue 


Figure 6.6 
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The singular value theorem for linear transformations is useful in its ma- 
trix form because we can perform numerical computations on matrices. We 
begin with the definition of the singular values of a matrix. 


Definition. Let A be anm xn matrix. We define the singular values 
of A to be the singular values of the linear transformation La. 


Theorem 6.27 (Singular Value Decomposition Theorem for Ma- 
trices). Let A be an m x n matrix of rank r with the positive singular 
values 0, > 02 >-+:: >o0,, and let 4 be the m x n matrix defined by 


= o, ifi=j<r 
"10 otherwise. 


Then there exists an m X m unitary matrix U and an n x n unitary matrix 


V such that 
A=UXV*. 


Proof. Let T= La: F” — F™. By Theorem 6.26, there exist orthonormal 
bases 3 = {v1,v2,..-,Un} for F® and y = {u1,u2,...,Um} for F™ such that 
T(u;) = ou; for 1 <i < rand T(u) = 0 fori > r. Let U be the m x m 
matrix whose jth column is u; for all 7, and let V be the n x n matrix whose 
jth column is v; for all 7. Note that both U and V are unitary matrices. 

By Theorem 2.13(a) (p. 90), the jth column of AV is Av; = o;u;. Observe 
that the jth column of & is oj;e;, where e; is the jth standard vector of F™. 
So by Theorem 2.13(a) and (b), the jth column of UD is given by 


U(o5e;) = ojU (ej) = o7uj. 


It follows that AV and UX are m x n matrices whose corresponding columns 
are equal, and hence AV = UX. Therefore A = AVV* = UNV*. | 


Definition. Let A be anm x n matrix of rank r with positive singular 
values 01 > 09 >+:: > 0,. A factorization A = UXV* where U and V are 
unitary matrices and } is the m x n matrix defined as in Theorem 6.27 is 
called a singular value decomposition of A. 


In the proof of Theorem 6.27, the columns of V are the vectors in 3, and 
the columns of U are the vectors in y. Furthermore, the nonzero singular 
values of A are the same as those of Ly; hence they are the square roots of 
the nonzero eigenvalues of A*A or of AA*. (See Exercise 9.) 
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We find a singular value decomposition for A = € ; = 


First observe that for 
1 1 
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the set @ = {v1,v2,v3} is an orthonormal basis for R® consisting of eigen- 
vectors of A*A with corresponding eigenvalues A; = 6, and Ag = A3 = 0. 
Consequently, 0; = V6 is the only nonzero singular value of A. Hence, as in 
the proof of Theorem 6.27, we let V be the matrix whose columns are the 


vectors in 3. Then 


ss Sl SI 


Next choose ug = —= 
J2 


orthonormal basis 7 = {u1,u2} for R?, and set 


es ee 
v2 V2 
U- & i) | 
v2 V2 
Then A = UNV* is the desired singular value decomposition. 


The Polar Decomposition of a Square Matrix 


4 


= a unit vector orthogonal to u;, to obtain the 


A singular value decomposition of a matrix can be used to factor a square 
matrix in a manner analogous to the factoring of a complex number as the 
product of a complex number of length 1 and a nonnegative number. In the 
case of matrices, the complex number of length 1 is replaced by a unitary 
matrix, and the nonnegative number is replaced by a positive semidefinite 


matrix. 


Theorem 6.28 (Polar Decomposition). For any square matrix A, 
there exists a unitary matrix W and a positive semidefinite matrix P such 


that 
A= WP. 
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Furthermore, if A is invertible, then the representation is unique. 


Proof. By Theorem 6.27, there exist unitary matrices U and V and a 
diagonal matrix © with nonnegative diagonal entries such that A = UXV*. 
So 


A= UXV* =UV*VaV* = WP, 


where W = UV* and P= VV*. Since W is the product of unitary matrices, 
W is unitary, and since © is positive semidefinite and P is unitarily equivalent 
to U, P is positive semidefinite by Exercise 14 of Section 6.5. 

Now suppose that A is invertible and factors as the products 


A=WP =ZQ, 


where W and Z are unitary and P and Q are positive semidefinite. Since A 
is invertible, it follows that P and Q are positive definite and invertible, and 
therefore Z*W = QP~!. Thus QP! is unitary, and so 


P=(QP7y OP YSP oP, 


Hence P? = Q?. Since both P and Q are positive definite, it follows that 
P=Q by Exercise 17 of Section 6.4. Therefore W = Z, and consequently 
the factorization is unique. 


The factorization of a square matrix A as WP where W is unitary and P 
is positive semidefinite, is called a polar decomposition of A. 
Example 4 
11 —5 
—2 10 


gular value decomposition UXV™* of A. The object is to find an orthonormal 
basis 3 for R? consisting of eigenvectors of A* A. It can be shown that 


aL) om 40) 


are orthonormal eigenvectors of A* A with corresponding eigenvalues A; = 200 
and Ay = 50. So B = {v1, v2} is an appropriate basis. Thus 01 = V200 = 
10\/2 and og = V50 = 5V2 are the singular values of A. So we have 


i ete 
tag id and soe EN ye 
v2 V2 


Next, we find the columns wu, and ug of U: 


1 1 4 1 1 /3 
uy = re =F (3) and ug = Pee a ({) : 


To find the polar decomposition of A = ( ) , we begin by finding a sin- 
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4 
= 5 
ve(4 8). 
5 
Therefore, in the notation of Theorem 6.28, we have 


4 3 1 -1 
Ya v2 1 7 -1 
W =UV* = ( 5 : ( 2 ’ = ( ) 
3 4 1 1 : 
5 5 V2 V2 5V2 \1 v 


1 1 —1 
") cw 0 ) G Z 7 
1 1 1 — 
Bi\ 9 BWI ZF 


Thus 


ous oe 


and 


p=vev"= ( 


ou 
ml 
an 
rw 
| 
wre 
Se 


Sil Sr 


The Pseudoinverse 


Let V and W be finite-dimensional inner product spaces over the same 
field, and let T: V — W be a linear transformation. It is desirable to have a 
linear transformation from W to V that captures some of the essence of an 
inverse of T even if T is not invertible. A simple approach to this problem 
is to focus on the “part” of T that is invertible, namely, the restriction of 
T to N(T)+. Let L: N(T)+ — R(T) be the linear transformation defined by 
L(x) = T(x) for all x € N(T)+. Then L is invertible, and we can use the 
inverse of L to construct a linear transformation from W to V that salvages 
some of the benefits of an inverse of T. 


Definition. Let V and W be finite-dimensional inner product spaces 
over the same field, and let T: V — W be a linear transformation. Let 
L: N(T)+ — R(T) be the linear transformation defined by L(x) = T(z) for all 
x € N(T)+. The pseudoinverse (or Moore-Penrose generalized inverse) of 
T, denoted by T', is defined as the unique linear transformation from W to 
V such that 


_ Jlt(y) for y € R(T) 
Ty) = . for y E R(T). 


The pseudoinverse of a linear transformation T on a finite-dimensional 
inner product space exists even if T is not invertible. Furthermore, if T 
is invertible, then T' = T~1 because N(T)+ = V, and L (as just defined) 
coincides with T. 

As an extreme example, consider the zero transformation To: V — W 
between two finite-dimensional inner product spaces V and W. Then R(To) = 
{0}, and therefore Tt is the zero transformation from W to V. 
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We can use the singular value theorem to describe the pseudoinverse of a 
linear transformation. Suppose that V and W are finite-dimensional vector 
spaces and T: V > Wisa linear transformation or rank r. Let {v1, v2,..., Un} 
and {u1,U2,.-.,Um} be orthonormal bases for V and W, respectively, and let 
01 > 02 >--- >a, be the nonzero singular values of T satisfying (4) in Theo- 
rem 6.26. Then {v1, v2,...,U,} is a basis for N(T)+, {vp41, Upt2,---,Un} isa 
basis for N(T), {u1, u2,..., Uy} is a basis for R(T), and {u,41, Uppe2,---,Um} is 
a basis for R(T)+. Let L be the restriction of T to N(T)+, as in the definition 


1 
of pseudoinverse. Then L~!(u;) = —v; for 1 <i <r. Therefore 
on 


0 ifr<i<m. 


Example 5 


Let T: Po(R) — Pi(R) be the linear transformation defined by T(f(x)) = 
f(z), as in Example 1. Let @ = {v1,v2,v3} and y = {u1, ug} be the or- 
thonormal bases for P2(R) and P;(R) in Example 1. Then 0, = V15 and 
o> = V3 are the nonzero singular values of T. It follows that 


a i (V3) = TT (uy) = * Vyp= Fay 2000" 1}, 


TI (a) = 5 (32 —1). 


and hence 


Similarly, T'(1) = 2. Thus, for any polynomial a + ba € P;(R), 


TH(a+ bx) = aT"(1) +0T"(2) = ax + 282? ~ 1). 4 


The Pseudoinverse of a Matrix 


Let A be an m x n matrix. Then there exists a unique n x m matrix B 
such that (L4)': F” — F” is equal to the left-multiplication transformation 
Lp. We call B the pseudoinverse of A and denote it by B = At. Thus 


(La)t ox! Lat. 


Let A be an m x n matrix of rank r. The pseudoinverse of A can be 
computed with the aid of a singular value decomposition A = UNV*. Let 
@ and y be the ordered bases whose vectors are the columns of V and U, 
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respectively, and let 0, > 02 >--:- > o, be the nonzero singular values of 
A. Then Z and ¥ are orthonormal bases for F” and F™, respectively, and (4) 
and (6) are satisfied for T = L4. Reversing the roles of 3 and ¥ in the proof 
of Theorem 6.27, we obtain the following result. 


Theorem 6.29. Let A be anm xn matrix of rank r with a singular value 
decomposition A = UXV™* and nonzero singular values 0, > 02 >-+:+: > Or. 
Let &' be the n x m matrix defined by 


1 
‘ — ifi=j<r 
vy =i Oj 
0 otherwise. 


Then At = VO'U*, and this is a singular value decomposition of A‘. 


Notice that 41 as defined in Theorem 6.29 is actually the pseudoinverse 
of &. 


Example 6 


We find A? for the matrix A = (; : =) 


Since A is the matrix of Example 3, we can use the singular value decom- 
position obtained in that example: 


1 1 bv 
1 1 V3 V2 V6 
Azusvr = (v2 v2) (v6 9 OVP a a 1 
~ ~ \ to =k 0 00 v3 v2 v6 
v2 V2 21 Oo 4% 
v3 v6 


Oe 


1 1 
va Va ve\ (ze 0 4, 
* 1 -1 1 2 
AT=VUIUT=| 75 Fe vel | 0 & 
el 0 0 V2 
V3 


Notice that the linear transformation T of Example 5 is Ly, where A is 
the matrix of Example 6, and that T? = L4;. 


The Pseudoinverse and Systems of Linear Equations 


Let A be an m x n matrix with entries in F’. Then for any b € F™, the 
matrix equation Ax = bis a system of linear equations, and so it either has no 
solutions, a unique solution, or infinitely many solutions. We know that the 
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system has a unique solution for every 6 € F™ if and only if A is invertible, 
in which case the solution is given by A~!b. Furthermore, if A is invertible, 
then A~! = At, and so the solution can be written as « = Atb. If, on the 
other hand, A is not invertible or the system Ax = b is inconsistent, then Atb 
still exists. We therefore pose the following question: In general, how is the 
vector Atb related to the system of linear equations Ax = b? 

In order to answer this question, we need the following lemma. 


Lemma. Let V and W be finite-dimensional inner product spaces, and let 
T: V—W be linear. Then 
(a) T'T is the orthogonal projection of V on N(T)+. 
(b) TT" is the orthogonal projection of W on R(T). 


Proof. As in the earlier discussion, we define L: N(T)+ — W by L(z) = 
T(x) for all 2 € N(T)+. If  € N(T)+, then TiT(x) = L7!L(z) = g, and if 
x € N(T), then T'T(x) = T'(0) = 0. Consequently T'T is the orthogonal 
projection of V on N(T)+. This proves (a). 

The proof of (b) is similar and is left as an exercise. | 


Theorem 6.30. Consider the system of linear equations Ax = b, where 
A is an m x n matrix and b € F™. If z = Atb, then z has the following 
properties. 

(a) If Ax = b is consistent, then z is the unique solution to the system 
having minimum norm. That is, z is a solution to the system, and if y 
is any solution to the system, then ||z|| < ||y|| with equality if and only 
ifz=y. 

(b) If Ax = b is inconsistent, then z is the unique best approximation to a 
solution having minimum norm. That is, || Az — 6|| < ||Ay — 6|| for any 
y € F”, with equality if and only if Az = Ay. Furthermore, if Az = Ay, 
then ||z|| < ||y|| with equality if and only if z = y. 


Proof. For convenience, let T = La. 

(a) Suppose that Ax = b is consistent, and let z = Atb. Observe that 
b € R(T), and therefore Az = AA'b = TT'(b) = b by part (b) of the lemma. 
Thus z is a solution to the system. Now suppose that y is any solution to the 
system. Then 


TIT(y) Al Ay = Alb=z, 


and hence z is the orthogonal projection of y on N(T)+ by part (a) of the 
lemma. Therefore, by the corollary to Theorem 6.6 (p. 350), we have that 
\|z|| < |ly|| with equality if and only if z = y. 

(b) Suppose that Ax = b is inconsistent. By the lemma, Az = AAtb = 
TT1(b) = b is the orthogonal projection of b on R(T); therefore, by the corol- 
lary to Theorem 6.6 (p. 350), Az is the vector in R(T) nearest b. That is, if 
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Ay is any other vector in R(T), then ||Az — 6|| < || Ay — 6|| with equality if 
and only if Az = Ay. 
Finally, suppose that y is any vector in F” such that Az = Ay =c. Then 


Alc= AtAz = AtAAtDb = Alb= z 


by Exercise 23; hence we may apply part (a) of this theorem to the system 
Ax = c to conclude that ||z|| < ||y|| with equality if and only if z = y. | 


Note that the vector z = Atb in Theorem 6.30 is the vector xo described 
in Theorem 6.12 that arises in the least squares application on pages 360-364. 
Example 7 


Consider the linear systems 


mt+%—-23=1 mt+%—-23=1 
1 2 3 sua 1 2 3 


@+%2—-273=1 ty +242 —%3=2. 


The first system has infinitely many solutions. Let A = (i : =| the 


coefficient matrix of the system, and let b = Gi By Example 6, 


1 1 
1 
At==[ 1 1], 
OS ey. St 
and therefore 
1 1 1 
1 1 
-l1 - —1 


is the solution of minimal norm by Theorem 6.30(a). 


1 


The second system is obviously inconsistent. Let b = G 


though 


) . Thus, al- 


y: °g 1 
z= Atb= = 1 1 ()=5 1 
=e 4) = 


is not a solution to the second system, it is the “best approximation” to a 
solution having minimum norm, as described in Theorem 6.30(b). 
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EXERCISES 


Label the following statements as true or false. 


(a) The singular values of any linear operator on a finite-dimensional 
vector space are also eigenvalues of the operator. 

(b) The singular values of any matrix A are the eigenvalues of A* A. 

(c) For any matrix A and any scalar c, if o is a singular value of A, 
then |c|o is a singular value of cA. 

(d) The singular values of any linear operator are nonnegative. 

(e) If is an eigenvalue of a self-adjoint matrix A, then . is a singular 
value of A. 

(f) For any mxn matrix A and any b € F”, the vector A'b is a solution 
to Ax = b. 

(g) The pseudoinverse of any linear operator exists even if the operator 
is not invertible. 


Let T: V — W be a linear transformation of rank r, where V and W 
are finite-dimensional inner product spaces. In each of the following, 
find orthonormal bases {v1, v2,...,Un} for V and {u1,ua,...,Um} for 
W, and the nonzero singular values 0; > 02 >--- >, of T such that 
T(uj) = oyu; for 1 <i<r. 

(a) T: R? —R3 defined by T(21, 22) = (#1, 21 + %2, ©1 — 22) 

(b) T: Po(R) > Pi(R), where T(f(x)) = f”(x), and the inner prod- 

ucts are defined as in Example 1 
(c) Let V = W = span({1,sin x, cos }) with the inner product defined 
27 


by (f,9) = Jo f(t)g() dt, and T is defined by T(f) = f’ + 2f 
(d) T: C? = C? defined by T(21, 22) = ((1 —#)z2, (1+ %)z1 + 22) 
Find a singular value decomposition for each of the following matrices. 


oe 10 1 p 
(a) {| 1 1 (b) (c) 
Ae oa € 0 e) : 


iy ae. pte Oh adti-o Seis 12 
(ay ld 2 0), qe) Ge Z) Gye. O24 
1) 0p at oo oe a (es Oe 


Find a polar decomposition for each of the following matrices. 
1 1 20 4 O 
(a) iG ie >) (b) 0 O 1 
4 20 0 


Find an explicit formula for each of the following expressions. 
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(a) T!(a1,x2,23), where T is the linear transformation of Exercise 2(a) 

(b) Tt(a + bx + cx), where T is the linear transformation of Exer- 
cise 2(b) 

(c) T'(a+ bsinz + ccosx), where T is the linear transformation of 
Exercise 2(c) 

(d) Ti(21,22), where T is the linear transformation of Exercise 2(d) 


Use the results of Exercise 3 to find the pseudoinverse of each of the 
following matrices. 


C4 
i 4 

i Oe i 01 

@ (1 1} & (aoe Rompe: 

11 
ee ee ae dos 44) aa 
(d) {1 -1 0] Ce) Ge 5) (3 a) a Oe a 
ee | a ee ae 


For each of the given linear transformations T: V — W, 
(i) Describe the subspace Z; of V such that T'T is the orthogonal 
projection of V on Z;. 
(ii) Describe the subspace Z2 of W such that TT' is the orthogonal 
projection of W on Zo. 


(a) T is the linear transformation of Exercise 2(a 
(b) T is the linear transformation of Exercise 2(b 
(c) T is the linear transformation of Exercise 2(c 
(d) T is the linear transformation of Exercise 2( 


For each of the given systems of linear equations, 
(i) If the system is consistent, find the unique solution having mini- 
mum norm. 
(ii) If the system is inconsistent, find the “best approximation to a 
solution” having minimum norm, as described in Theorem 6.30(b). 
(Use your answers to parts (a) and (f) of Exercise 6.) 


tyr tg 1 t+ 2X2 x3 w4 = 
(a) tr mg 2 (b) Ly = 2x3 + v4 = —l 
et) Os es et) 0? i 0 vy v9 X3 LA = 


Let V and W be finite-dimensional inner product spaces over F', and sup- 
pose that {v1,v2,...,Un} and {u1,u2,...,Um} are orthonormal bases 
for V and W, respectively. Let T: V — W is a linear transformation of 
rank r, and suppose that a, > 02 >--: > 0, >0 are such that 


T(v) oju; iwl<i<r 
Ui = 
0 ifr <i. 
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(a) Prove that {wu ,u2,...,Um} is a set of eigenvectors of TT* with 
corresponding eigenvalues 1, A2,..., Am, where 


ete ge like 
‘\0 ifr <i. 


(b) Let A be an mxn matrix with real or complex entries. Prove that 
the nonzero singular values of A are the positive square roots of 
the nonzero eigenvalues of AA*, including repetitions. 

(c) Prove that TT* and T*T have the same nonzero eigenvalues, in- 
cluding repetitions. 

(d) State and prove a result for matrices analogous to (c). 


Use Exercise 8 of Section 2.5 to obtain another proof of Theorem 6.27, 
the singular value decomposition theorem for matrices. 


This exercise relates the singular values of a well-behaved linear operator 
or matrix to its eigenvalues. 


(a) Let T be a normal linear operator on an n-dimensional inner prod- 
uct space with eigenvalues A, A2,...,An- Prove that the singular 
values of T are |Aj|, |Ao|,---5 |Anl- 

(b) State and prove a result for matrices analogous to (a). 


Let A be a normal matrix with an orthonormal basis of eigenvectors 
B = {v1,v2,...,Un} and corresponding eigenvalues Aq, A2,...,An. Let 
V be the n x n matrix whose columns are the vectors in 3. Prove that 
for each i there is a scalar 0; of absolute value 1 such that if U is the 
nm X n matrix with 6;v; as column i and © is the diagonal matrix such 
that ©;; = |A;| for each 7, then UNV* is a singular value decomposition 
of A. 


Prove that if A is a positive semidefinite matrix, then the singular values 
of A are the same as the eigenvalues of A. 


Prove that if A is a positive definite matrix and A = UXV* is a singular 
value decomposition of A, then U = V. 


Let A be a square matrix with a polar decomposition A = WP. 


(a) Prove that A is normal if and only if WP? = P?W. 
(b) Use (a) to prove that A is normal if and only if WP = PW. 


Let A be a square matrix. Prove an alternate form of the polar de- 
composition for A: There exists a unitary matrix W and a positive 
semidefinite matrix P such that A = PW. 


Sec 
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22. 
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Let T and U be linear operators on R? defined for all (2,22) € R? by 
T(x1, 2) = (#1,0) and U(21, x2) = (x1 + £2,0). 


(a) Prove that (UT)' 4 TTUT. 
(b) Exhibit matrices A and B such that AB is defined, but (AB)' 4 
BtAt, 


Let A be an m x n matrix. Prove the following results. 


(a) For any m x m unitary matrix G, (GA)t = A'G*. 
(b) For any n x n unitary matrix H, (AH)t = H* At. 


Let A be a matrix with real or complex entries. Prove the following 
results. 


(a) The nonzero singular values of A are the same as the nonzero 
singular values of A*, which are the same as the nonzero singular 
values of A®. 

(b) (At)* = (A*)I. 

(c) (At)? = (A. 


Let A be a square matrix such that A? = O. Prove that (A‘)? = O. 


Let V and W be finite-dimensional inner product spaces, and let 
T: V—W be linear. Prove the following results. 

(a) TTIT=T. 

(by TeTT =] 7", 

(c) Both T'T and TT? are self-adjoint. 


The preceding three statements are called the Penrose conditions, 
and they characterize the pseudoinverse of a linear transformation as 
shown in Exercise 22. 


Let V and W be finite-dimensional inner product spaces. Let T: V — W 
and U: W — V be linear transformations such that TUT = T, UTU = U, 
and both UT and TU are self-adjoint. Prove that U = T?. 


State and prove a result for matrices that is analogous to the result of 
Exercise 21. 


State and prove a result for matrices that is analogous to the result of 
Exercise 22. 


Let V and W be finite-dimensional inner product spaces, and let 
T: V—W be linear. Prove the following results. 

(a) If T is one-to-one, then T*T is invertible and Tt = (T*T)~!T*. 
(b) If T is onto, then TT* is invertible and Ti = T*(TT*)7?. 
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26. Let V and W be finite-dimensional inner product spaces with orthonor- 
mal bases 3 and 7¥, respectively, and let T: V — W be linear. Prove 
that (uy = [rae 


27. Let V and W be finite-dimensional inner product spaces, and let 
T: V — W be a linear transformation. Prove part (b) of the lemma 
to Theorem 6.30: TTT is the orthogonal projection of W on R(T). 


6.8* BILINEAR AND QUADRATIC FORMS 


There is a certain class of scalar-valued functions of two variables defined on 
a vector space that arises in the study of such diverse subjects as geometry 
and multivariable calculus. This is the class of bilinear forms. We study the 
basic properties of this class with a special emphasis on symmetric bilinear 
forms, and we consider some of its applications to quadratic surfaces and 
multivariable calculus. 


Bilinear Forms 


Definition. Let V be a vector space over a field F. A function H from 
the set V x V of ordered pairs of vectors to F is called a bilinear form on V 
if H is linear in each variable when the other variable is held fixed; that is, 
A is a bilinear form on V if 

(a) H(av,+2%o,y) =aH(x1,y)+ A(xe,y) foralla1,r2,y € Vanda e€ F 
(b) A(x, ayi + yo) = aH(x,y1)+ A(x, y2) forallz,yi,y2 © Vandae F. 


We denote the set of all bilinear forms on V by B(V). Observe that an 
inner product on a vector space is a bilinear form if the underlying field is 
real, but not if the underlying field is complex. 


Example 1 
Define a function H: R? x R? — R by 


A Se 5 b1 = 2a,b; + 3a,b2 + 4a2b1 — agbo for oo F by € R?. 
ag bo ag bo 


We could verify directly that H is a bilinear form on R?. However, it is more 
enlightening and less tedious to observe that if 


_ [2 3 — fa — fh 
ral 3) 2 (0). mt 2) 


A(2,y) = a’ Ay. 


then 


The bilinearity of H now follows directly from the distributive property of 
matrix multiplication over matrix addition. 
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The preceding bilinear form is a special case of the next example. 


Example 2 
Let V = F”, where the vectors are considered as column vectors. For any 
AE Mnxn(F), define H: V x V > F by 

H(2,y)=a'Ay forz,y€ V. 


Notice that since x and y are n x 1 matrices and A is an n x n matrix, H(z, y) 
isa 1x1 matrix. We identify this matrix with its single entry. The bilinearity 
of H follows as in Example 1. For example, for a € F and 71,2%2,y € V, we 
have 
A (aa, + %2,y) = (aay + #2)’ Ay = (ari + #5) Ay 
= ax’ Ay + «i,Ay 
=aH(r1,y)+H(x2,y). 


We list several properties possessed by all bilinear forms. Their proofs are 
left to the reader (see Exercise 2). 

For any bilinear form H on a vector space V over a field F’, the following 
properties hold. 


1. If, for any x € V, the functions L,,R,: V — F are defined by 
L(y) = H(a,y) and R,(y) = HA(y,x) forally eV, 
then L, and R, are linear. 
2. H(0,x) = H(x,0)=0 for allaweV. 
3. For all x,y, z,w € V, 
A(x+y,2+w) = H(2,z)+ H(x,w) + Hy,z) + Hy, wv). 


4. If J: Vx V = F is defined by J(z,y) = H(y,2x), then J is a bilinear 


form. 


Definitions. Let V be a vector space, let H; and H2 be bilinear forms 
on V, and let a be a scalar. We define the sum H, + Hz and the scalar 
product aH, by the equations 


(i + H2)(x,y) = Hi(x,y) + Ho(a,y) 
and 
(aH,)(x,y) =a(Ai(az,y)) foralla,y eV. 


The following theorem is an immediate consequence of the definitions. 
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Theorem 6.31. For any vector space V, the sum of two bilinear forms 
and the product of a scalar and a bilinear form on V are again bilinear forms 
on V. Furthermore, B(V) is a vector space with respect to these operations. 


Proof. Exercise. i 


Let @ = {v1,v2,...,Un} be an ordered basis for an n-dimensional vector 
space V, and let H € B(V). We can associate with H an n x n matrix A 
whose entry in row 7 and column 7 is defined by 

Ay = H(030;) for 1,7 =1,2,...,n. 

Definition. The matrix A above is called the matrix representation 

of H with respect to the ordered basis 3 and is denoted by wg(H). 


We can therefore regard wg as a mapping from B(V) to Mrxn(F), where 
F is the field of scalars for V, that takes a bilinear form H into its matrix 
representation Wg(H). We first consider an example and then show that wg 
is an isomorphism. 
Example 3 


Consider the bilinear form H of Example 1, and let 


s={(9).(Q)} a aos 


Then 
1 1 
ovan( (i) ())=200e128 
noo ((). CQ) eaters 
1 —1 
Ba = H(( ace =2+3-44+1=2, 
—1 1 
and 
1 1 
ba = #((_i).(_3)) =2-—-3-4-1=-6. 
So 


vot) = (5 _§)- 


If 7 is the standard ordered basis for R?, the reader can verify that 


wt=(7 Jf). 4 
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Theorem 6.32. For any n-dimensional vector space V over F and any 
ordered basis 3 for V, Wg: B(V) > Mnxn(F) is an isomorphism. 


Proof. We leave the proof that wg is linear to the reader. 

To show that wg is one-to-one, suppose that ~g(H) = O for some H € 
B(V). Fix v; € 8, and recall the mapping L,,: V — F, which is linear by 
property 1 on page 423. By hypothesis, L,,(v;) = H(vi,v;) = 0 for all v; € 6. 
Hence L,, is the zero transformation from V to F’. So 


A(v;,2) =by, (x) =0 foralla € Vandy € 2. (7) 


Next fix an arbitrary y € V, and recall the linear mapping R,: V — F defined 
in property 1 on page 423. By (7), Ry(v:) = H(u,y) = 0 for all uv; € 8, and 
hence R, is the zero transformation. So H(zx,y) = Ry(#) = 0 for all z,y eV. 
Thus # is the zero bilinear form, and therefore 7g is one-to-one. 

To show that 7g is onto, consider any A € Mnxn(F). Recall the isomor- 
phism ¢g: V — F” defined in Section 2.4. For x € V, we view ¢g(x) € F” as 
a column vector. Let H: V x V — F be the mapping defined by 


A(a,y) = [ba(2)|’Alga(y)] for all x,y € V. 
A slight embellishment of the method of Example 2 can be used to prove that 
H ¢€ B(V). We show that wg(H) = A. Let v;,v; € G. Then dg(v,) = e; and 
¢g(v;) = e;; hence, for any i and J, 

A(u,, vj) = [ba(~i) | Alba (vs)] = ej Ae; = Aiy- 
We conclude that 7g(H) = A and wg is onto. | 


Corollary 1. For any n-dimensional vector space V, B(V) has dimen- 


sion n?. 


Proof. Exercise. | 


The following corollary is easily established by reviewing the proof of 
Theorem 6.32. 


Corollary 2. Let V be an n-dimensional vector space over F with 
ordered basis 3. If H € B(V) and A € My x,»(F), then ~g(H) = A if and 
only if H(x,y) = [ba(x)]'A[dg(y)] for all x,y € V. 


The following result is now an immediate consequence of Corollary 2. 


Corollary 3. Let F bea field, n a positive integer, and G be the standard 
ordered basis for F”. Then for any H € B(F”), there exists a unique matrix 
AE Mnxn(F), namely, A = i3(H), such that 


H(z,y) =a'Ay foralla,y € F”. 
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Example 4 
Define a function H: R? x R? — R by 


ay by = ay by o, ay by 2 
HH ((¢:) A (3)) = det & a =a by — ab, for i : 3) ER’. 


It can be shown that H is a bilinear form. We find the matrix A in Corollary 3 
such that H(x,y) = 2‘ Ay for all x,y € R?. 


Since A,;; = H(e;,e,;) for all i and j, we have 


1 1 1 0 
Au = det ( s) = Ajo = det (i \ = 1, 


0 1 0 0 
Ag = det « i) =-1 and Ago = det (; ') = 0. 


Therefore A = ( Y ). 4 
-1 0 

There is an analogy between bilinear forms and linear operators on finite- 
dimensional vector spaces in that both are associated with unique square 
matrices and the correspondences depend on the choice of an ordered basis for 
the vector space. As in the case of linear operators, one can pose the following 
question: How does the matrix corresponding to a fixed bilinear form change 
when the ordered basis is changed? As we have seen, the corresponding 
question for matrix representations of linear operators leads to the definition 
of the similarity relation on square matrices. In the case of bilinear forms, 
the corresponding question leads to another relation on square matrices, the 
congruence relation. 


Definition. Let A,B E Mnxn(F). Then B is said to be congruent to 
A if there exists an invertible matrix Q € Mnyn(F) such that B = QtAq. 


Observe that the relation of congruence is an equivalence relation (see 
Exercise 12). 

The next theorem relates congruence to the matrix representation of a 
bilinear form. 


Theorem 6.33. Let V be a finite-dimensional vector space with ordered 
bases 3 = {v1,U2,...,Un} and y = {w1, W2,..., Wn}, and let Q be the change 
of coordinate matrix changing y-coordinates into 3-coordinates. Then, for 
any H € B(V), we have w,(H) = Q'We(H)Q. Therefore w,(H) is congruent 
to ve(H) : 


Proof. There are essentially two proofs of this theorem. One involves a 
direct computation, while the other follows immediately from a clever obser- 
vation. We give the more direct proof here, leaving the other proof for the 
exercises (see Exercise 13). 
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Suppose that A = wg(H) and B = w,(H). Then for 1 < i,j <n, 


n n 
Wi = S Qivk and Wi = S QrjUr- 
r=1 


k=1 
Thus 
Bi = H(wji,w;) = u(y une 
= ty Qi (vg, w;) 
k=1 
=S°a a (mo S- 2a] 
k=l 
= 259 ki x Q,jH (vk, Ur) 
=S°Q ki si Qrj Akr 
k=1 r=1 
= 228 Qki 3 AgrQrj 
k=1 
= 5 Oi (AQ)aj = (Q*AQ) 
k=1 
Hence B = Qt AQ. | 


The following result is the converse of Theorem 6.33. 


Corollary. Let V be an n-dimensional vector space with ordered basis (3, 
and let H be a bilinear form on V. For any n x n matrix B, if B is congruent 
to w3(H), then there exists an ordered basis y for V such that ~(H) = B. 
Furthermore, if B = Q'wg(H)Q for some invertible matrix Q, then Q changes 
y-coordinates into 3-coordinates. 


Proof. Suppose that B = Q'wg3(H)Q for some invertible matrix Q and 
that 3 = {v1,v2,...,Un}. Let y = {w1, wa,...,Wn}, where 


n 
wy = >> Qi forl<j<n. 


i=l 


428 Chap. 6 Inner Product Spaces 


Since Q is invertible, y is an ordered basis for V, and Q is the change of 
coordinate matrix that changes y-coordinates into @-coordinates. Therefore, 
by Theorem 6.32, 


B= Q'9(H)Q = U,(H). F 


Symmetric Bilinear Forms 


Like the diagonalization problem for linear operators, there is an analogous 
diagonalization problem for bilinear forms, namely, the problem of determin- 
ing those bilinear forms for which there are diagonal matrix representations. 
As we will see, there is a close relationship between diagonalizable bilinear 
forms and those that are called symmetric. 


Definition. A bilinear form H on a vector space V is symmetric if 
H(«,y) = A(y,2) for all x,y €V. 


As the name suggests, symmetric bilinear forms correspond to symmetric 
matrices. 


Theorem 6.34. Let H be a bilinear form on a finite-dimensional vector 
space V, and let @ be an ordered basis for V. Then H is symmetric if and 
only if Wg(H) is symmetric. 


Proof. Let @ = {v1,v2,---,;Un} and B = w,(H). 
First assume that H is symmetric. Then for 1 <i,7 <n, 


Bi; = H(v;,v;) = H(v;,v;) = By, 


and it follows that B is symmetric. 

Conversely, suppose that B is symmetric. Let J: V x V — F, where F is 
the field of scalars for V, be the mapping defined by J(x,y) = H(y,2) for all 
x,y € V. By property 4 on page 423, J is a bilinear form. Let C = ya(J). 
Then, for 1 <i,7 <n, 


Thus C = B. Since wg is one-to-one, we have J = H. Hence H(y,x) = 
J(x,y) = H(a,y) for all x,y € V, and therefore H is symmetric. 


Definition. A bilinear form H on a finite-dimensional vector space V is 
called diagonalizable if there is an ordered basis 3 for V such that wg(H) 
is a diagonal matrix. 


Corollary. Let H be a diagonalizable bilinear form on a finite-dimensional 
vector space V. Then H is symmetric. 
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Proof. Suppose that H is diagonalizable. Then there is an ordered basis @ 
for V such that #g(H) = D is a diagonal matrix. Trivially, D is a symmetric 
matrix, and hence, by Theorem 6.34, H is symmetric. | 


Unfortunately, the converse is not true, as is illustrated by the following 
example. 


Example 5 
Let F = Z2, V= F*, and H: Vx V — F be the bilinear form defined by 


b 
H((3).()) <0 


Clearly H is symmetric. In fact, if @ is the standard ordered basis for V, then 


A= vs(H) = a i . 


a symmetric matrix. We show that A is not diagonalizable. 


By way of contradiction, suppose that H is diagonalizable. Then there is 
an ordered basis 7 for V such that B = ~,(H) is a diagonal matrix. So by 
Theorem 6.33, there exists an invertible matrix Q such that B = Q* AQ. Since 
Q is invertible, it follows that rank(B) = rank(A) = 2, and consequently the 
diagonal entries of B are nonzero. Since the only nonzero scalar of F is 1, 


-(,9) 


Suppose that 


Then 


1 0 
é 1) = B=Q'4Q 


_fae\/f0 1 a b\  fact+tac be+ad 
~\b d}J\1l O} \e dj} \be+ad bd+bd)° 
But p+ p = 0 for all p € F; hence ac+ ac = 0. Thus, comparing the row 


1, column 1 entries of the matrices in the equation above, we conclude that 
1 = 0, a contradiction. Therefore H is not diagonalizable. 


The bilinear form of Example 5 is an anomaly. Its failure to be diagonal- 
izable is due to the fact that the scalar field Z is of characteristic two. Recall 
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from Appendix C that a field F is of characteristic two if 1+1=0 in F. 
If F is not of characteristic two, then 1+ 1 = 2 has a multiplicative inverse, 
which we denote by 1/2. 

Before proving the converse of the corollary to Theorem 6.34 for scalar 
fields that are not of characteristic two, we establish the following lemma. 


Lemma. Let H be a nonzero symmetric bilinear form on a vector space 
V over a field F' not of characteristic two. Then there is a vector x in V such 
that H(x,x) £0. 


Proof. Since H is nonzero, we can choose vectors u,v € V such that 
H(u,v) #4 0. If A(u,u) 4 0 or A(v,v) # 0, there is nothing to prove. 
Otherwise, set x = u+v. Then 


H(«,x) = H(u,u)+ A(u,v) + H(v,u) + A(v,v) = 2H (u,v) £0 
because 24 0 and H(u,v) £0. | 


Theorem 6.35. Let V be a finite-dimensional vector space over a field 
F not of characteristic two. Then every symmetric bilinear form on V is 
diagonalizable. 


Proof. We use mathematical induction on n = dim(V). Ifn = 1, then every 
element of B(V) is diagonalizable. Now suppose that the theorem is valid 
for vector spaces of dimension less than n for some fixed integer n > 1, and 
suppose that dim(V) = n. If H is the zero bilinear form on V, then trivially H 
is diagonalizable; so suppose that H is a nonzero symmetric bilinear form on 
V. By the lemma, there exists a nonzero vector x in V such that H(x,x) 4 0. 
Recall the function L,: V — F defined by L,(y) = H(a,y) for all y € V. By 
property 1 on page 423, L, is linear. Furthermore, since L, (x) = H(x,x) 4 0, 
L, is nonzero. Consequently, rank(L,) = 1, and hence dim(N(L,)) =n — 1. 

The restriction of H to N(L,) is obviously a symmetric bilinear form on 
a vector space of dimension n — 1. Thus, by the induction hypothesis, there 
exists an ordered basis {v1,v2,...,Un—1} for N(Lz) such that H(v;,v;) = 0 
fori Aj (1 < tf < n—-1). Set vp» = x. Then v, ¢ N(Lz), and so 
GB = {v1,v2,...,Un} is an ordered basis for V. In addition, H(v;,un) = 
H(vn, vi) = 0 for i = 1,2,...,2—1. We conclude that 73(H) is a diagonal 
matrix, and therefore H is diagonalizable. | 


Corollary. Let F be a field that is not of characteristic two. If A € 
Mnxn(F) is a symmetric matrix, then A is congruent to a diagonal matrix. 


Proof. Exercise. i 
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Diagonalization of Symmetric Matrices 


Let A be a symmetric n x n matrix with entries from a field F' not of 
characteristic two. By the corollary to Theorem 6.35, there are matrices 
Q,D € Maxn(F) such that Q is invertible, D is diagonal, and Q'AQ = D. We 
now give a method for computing Q and D. This method requires familiarity 
with elementary matrices and their properties, which the reader may wish to 
review in Section 3.1. 

If F is an elementary nxn matrix, then AE can be obtained by performing 
an elementary column operation on A. By Exercise 21, EA can be obtained 
by performing the same operation on the rows of A rather than on its columns. 
Thus E' AE can be obtained from A by performing an elementary operation 
on the columns of A and then performing the same operation on the rows 
of AE. (Note that the order of the operations can be reversed because of 
the associative property of matrix multiplication.) Suppose that Q is an 
invertible matrix and D is a diagonal matrix such that Q?AQ = D. By 
Corollary 3 to Theorem 3.6 (p. 159), Q is a product of elementary matrices, 
say Q = EF, Ey--- E,. Thus 


D=Q'AQ = ELEt_,-- ES AE, E2-+- Ey. 


From the preceding equation, we conclude that by means of several elemen- 
tary column operations and the corresponding row operations, A can be trans- 
formed into a diagonal matrix D. Furthermore, if E,, Fo,...,E, are the 
elementary matrices corresponding to these elementary column operations in- 
dexed in the order performed, and if Q = E,E,--+ Ex, then Q‘AQ = D. 


Example 6 
Let A be the symmetric matrix in Ms3x3(R) defined by 
1 -1l 

A=|{-1 2 

3.01 


PR w 


We use the procedure just described to find an invertible matrix @ and a 
diagonal matrix D such that Q‘AQ = D. 


We begin by eliminating all of the nonzero entries in the first row and 
first column except for the entry in column 1 and row 1. To this end, we 
add the first column of A to the second column to produce a zero in row 1 
and column 2. The elementary matrix that corresponds to this elementary 
column operation is 
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We perform the corresponding elementary operation on the rows of AF, to 
obtain 


LO 8 
Et‘AE,=|0 1 4 
3.4 1 


We now use the first column of E{ AE; to eliminate the 3 in row 1 column 3, 
and follow this operation with the corresponding row operation. The corre- 
sponding elementary matrix E, and the result of the elementary operations 
ESE} AE) E2 are, respectively, 


GN 8 tb O° 0 
Ey=({0 1 O| and ESE{AR|F,={0 1 4 
90: 4 0 4 +8 


Finally, we subtract 4 times the second column of ESE{AE,E2 from the 
third column and follow this with the corresponding row operation. The cor- 
responding elementary matrix £3 and the result of the elementary operations 
EES Et AE, EE are, respectively, 


10 0 10 0 
E3=[0 1 -4] and ELELESAR,E,E3;=1]0 1 0 
00 1 0 0 =94 


Since we have obtained a diagonal matrix, the process is complete. So we let 


a ny 10 0 
Q=E,E,9E;=|0 1 -4] and D=]0 1° 0 
00 1 0 0 24 


to obtain the desired diagonalization Q©AQ =D. 


The reader should justify the following method for computing Q without 
recording each elementary matrix separately. The method is inspired by the 
algorithm for computing the inverse of a matrix developed in Section 3.2. 
We use a sequence of elementary column operations and corresponding row 
operations to change the n x 2n matrix (A|J) into the form (D|B), where D 
is a diagonal matrix and B = Q'. It then follows that D = Qt AQ. 

Starting with the matrix A of the preceding example, this method pro- 
duces the following sequence of matrices: 


fb. 34f-0" 0 Ls 00 B./ th “0 0 
ADS at Tor 2 Oi) —s at 42, Deo 
it “A Or nO -S SA 1:08 0° 


Sec. 6.8 Bilinear and Quadratic Forms 433 


1 0 3}]1 0 0 1 O 0/1 0 0 
— {101 4/1 1 0] — {0 1 4/1 1 0 
3 4 1/0 0 1 3 4 -8/0 0 1 
1 0 0 1 0 0 1 0 0 1 0 0 
—+ 1.0. 2 4 11 0)—]01 0 1 10 
0 4 -8}-3 0 1 0 4 -—24/-3 0 1 
1 0 0 1 0 0 
— {10 1 0 1 1 0 = (D|Q"') 
0 0 —-24|)-7 -4 1 
Therefore 
1 0 0 1 0 O 1 1 —-7 
D={0 1 0}, Q= 1 1 O}], and Q=1{0 1 —-4 
0 0 —24 —7 -4 1 0 O 1 


Quadratic Forms 


Associated with symmetric bilinear forms are functions called quadratic 
forms. 


Definition. Let V be a vector space over F. A function K: V > F is 
called a quadratic form if there exists a symmetric bilinear form H € B(V) 
such that 


K(«)=H(a,x) forallaeV. (8) 


If the field F is not of characteristic two, there is a one-to-one correspon- 
dence between symmetric bilinear forms and quadratic forms given by (8). 
In fact, if K is a quadratic form on a vector space V over a field F' not of 
characteristic two, and K(x) = H(a,x) for some symmetric bilinear form H 
on V, then we can recover H from K because 


A(a,y) = 5[K(a+y) — K(x) — K(y)] (9) 


(See Exercise 16.) 


Example 7 


The classic example of a quadratic form is the homogeneous second-degree 
polynomial of several variables. Given the variables t),t2,...,t, that take 
values in a field F' not of characteristic two and given (not necessarily distinct) 
scalars aj; (1 <i <j <n), define the polynomial 


f (ti, te, wad be) = S- ajjtit;. 


tSj 
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Any such polynomial is a quadratic form. In fact, if @ is the standard or- 
dered basis for F”, then the symmetric bilinear form H corresponding to the 
quadratic form f has the matrix representation ~g(H) = A, where 


cee ae $y ifiF j. 


To see this, apply (9) to obtain H(e;,e;) = A;; from the quadratic form K, 
and verify that f is computable from H by (8) using f in place of K. 


For example, given the polynomial 
f(t, ta, t3) oa oF = ie + 6t te = Atots 


with real coefficients, let 


2 3 0 
A= {3 -1 -2 
0 -2 O 


Setting H(z, y) = x Ay for all x,y € R°, we see that 


ty ty 
f (ti, te, ts) = (t1,to,ts3)A to for to E R?, 2 
t3 t3 


Quadratic Forms Over the Field R 


Since symmetric matrices over R are orthogonally diagonalizable (see The- 
orem 6.20 p. 384), the theory of symmetric bilinear forms and quadratic forms 
on finite-dimensional vector spaces over R is especially nice. The following 
theorem and its corollary are useful. 


Theorem 6.36. Let V be a finite-dimensional real inner product space, 
and let H be asymmetric bilinear form on V. Then there exists an orthonor- 
mal basis 3 for V such that 7g3(H) is a diagonal matrix. 


Proof. Choose any orthonormal basis y = {v1,v2,...,;Un} for V, and let 
A = »,(H). Since A is symmetric, there exists an orthogonal matrix Q 
and a diagonal matrix D such that D = Q'*AQ by Theorem 6.20. Let 3 = 
{wi,We,...,Wn} be defined by 


n 
4=1 


By Theorem 6.33, w3(H) = D. Furthermore, since Q is orthogonal and 7¥ is 
orthonormal, @ is orthonormal by Exercise 30 of Section 6.5. 
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Corollary. Let K be a quadratic form on a finite-dimensional real inner 
product space V. There exists an orthonormal basis 3 = {v1,v2,...,Un} for 
V and scalars \1,2,.-.,An (not necessarily distinct) such that if x € V and 


n 
C= ) Siv;, 5, ER, 
i=l 


then 


In fact, if H is the symmetric bilinear form determined by K, then ( can 
be chosen to be any orthonormal basis for V such that wg(H) is a diagonal 
matrix. 


Proof. Let H be the symmetric bilinear form for which K(x) = H(a, 
for all x € V. By Theorem 6.36, there exists an orthonormal basis (3 
{v1, V2,...,Un} for V such that 79(H) is the diagonal matrix 


r) 


Min 10 cee 0 
Oi Sg kee: 6 
D=]|. ; : 
O° Ob -ee “Dye 


Let x € V, and suppose that x = 77", s;v;. Then 


K (2) =H(2, x) = [¢e(2)]’Dlga(a)]=(s1, 82,---,8n)D | . =D isi. i 


Example 8 


For the homogeneous real polynomial of degree 2 defined by 
f (ti, te) = Bez + 23 + 4tyte, (10) 


we find an orthonormal basis y = {1,72} for R? and scalars 1 and 2 such 


that if 
i eR? and a = $1%1 + 82X2, 
to to 


then f(t1,t2) = A187 + A283. We can think of s; and s2 as the coordinates of 
(ti, t2) relative to y. Thus the polynomial f(t, t2), as an expression involving 
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the coordinates of a point with respect to the standard ordered basis for R?, 
is transformed into a new polynomial g(si, 82) = A187 + A283} interpreted as 
an expression involving the coordinates of a point relative to the new ordered 
basis 7¥. 


Let H denote the symmetric bilinear form corresponding to the quadratic 
form defined by (10), let @ be the standard ordered basis for R?, and let 
A= ve(A). Then 


A = $g(H) = € A 


Next, we find an orthogonal matrix Q such that Q*AQ is a diagonal matrix. 
For this purpose, observe that A; = 6 and 2 = 1 are the eigenvalues of A 
with corresponding orthonormal eigenvectors 


oe) to) 


Let y = {v1,v2}. Then ¥ is an orthonormal basis for R? consisting of eigen- 
vectors of A. Hence, setting 
1/2 1 
o=2(7 2): 


we see that Q is an orthogonal matrix and 


aso (6 9) 


Clearly Q is also a change of coordinate matrix. Consequently, 
by(H) = Qluie( HQ = Q'AQ = € ') | 
Thus by the corollary to Theorem 6.36, 
K(x) = 6s} + s2 
for any © = 8101 + 82v2 € R?. So g(s1, 82) =687 +53. 


The next example illustrates how the theory of quadratic forms can be 
applied to the problem of describing quadratic surfaces in R?. 


Example 9 
Let S be the surface in R® defined by the equation 


2t? + Gtite + 5t2 — Qtots + 22 + 3t, — 2to — tg +14=0. (11) 
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Then (11) describes the points of S in terms of their coordinates relative to /, 
the standard ordered basis for R®. We find a new orthonormal basis 7 for R? 


so that the equation describing the coordinates of S relative to 7 is simpler 
than (11). 


We begin with the observation that the terms of second degree on the left 
side of (11) add to form a quadratic form K on R?: 


K | to | = 2t? + 6tite + 5t5 — 2tots + 23. 


Next, we diagonalize kK. Let H be the symmetric bilinear form corre- 
sponding to K, and let A = ~g(H). Then 


2 3 0 
A=]{3 5 -1 
0 -1 2 


The characteristic polynomial of A is (—1)(t — 2)(t — 7)t; hence A has the 
eigenvalues Ay = 2, Ag = 7, and A3 = 0. Corresponding unit eigenvectors are 


1 3 3 

: 0 : 5 and : 2 
y= , w= = , and v3 = — 

* V10\3 7/735 \ ae 57 er 


1 3 #===8 
J10 V35 V14 
Q= se |e 
0 7s Va 
3. -1l 1 
Jio V35 Vid 


As in Example 8, Q is a change of coordinate matrix changing 7-coordinates 
to 6-coordinates, and 


oN Oo 


2 0 0 
by (H) = Q'be(H)Q = Q'AQ = [0 0 
0 0 0 


By the corollary to Theorem 6.36, if x = s,v, + sgv2 + 53u3, then 


K(x) = 2s? + 7s. (12) 
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i OK 
WR peste 


Figure 6.7 


We are now ready to transform (11) into an equation involving coordinates 
relative to y. Let x = (t1, t2,t3) € R®, and suppose that x = s1v1+s2v2+5303. 
Then, by Theorem 2.22 (p. 111), 


and therefore 


; Sy, 389 353 
‘10 4/35 4/14’ 
589 253 
ES 2 +4, 
5 V35 (14 
and 
ts = 381 52 $3 


10. 4/35. «(14 
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Thus 


14 
3ty 2ty t3 — Tt =— V 14s3. 


Combining (11), (12), and the preceding equation, we conclude that if x € R? 
and & = s1v1 + s9vq + 83v3, then x € S if and only if 


14 V14 
2s? +783 —V14s3+14=0 or s3= se si 5 sat+Vv14. 


Consequently, if we draw new axes x’,y’, and z’ in the directions of v1, ve, 
and v3, respectively, the graph of the equation, rewritten as 


coincides with the surface S. We recognize S to be an elliptic paraboloid. 


Figure 6.7 is a sketch of the surface S drawn so that the vectors v;, vg and 
v3 are oriented to lie in the principal directions. For practical purposes, the 
scale of the z’ axis has been adjusted so that the figure fits the page. 


The Second Derivative Test for Functions of Several Variables 


We now consider an application of the theory of quadratic forms to mul- 
tivariable calculus—the derivation of the second derivative test for local ex- 
trema of a function of several variables. We assume an acquaintance with the 
calculus of functions of several variables to the extent of Taylor’s theorem. 
The reader is undoubtedly familiar with the one-variable version of Taylor’s 
theorem. For a statement and proof of the multivariable version, consult, for 
example, An Introduction to Analysis 2d ed, by William R. Wade (Prentice 
Hall, Upper Saddle River, N.J., 2000). 

Let z = f(ti,te,...,tn) be a fixed real-valued function of n real variables 
for which all third-order partial derivatives exist and are continuous. The 
function f is said to have a local maximum at a point p € R” if there exists 
ad >Osuch that f(p) > f(x) whenever ||x— p|| < 6. Likewise, f has a local 
minimum at p € R” if there exists a 6 > 0 such that f(p) < f(x) whenever 
|| —p|| < 6. If f has either a local minimum or a local maximum at p, we say 
that f has a local extremum at p. A point p € R” is called a critical point 
of f if Of(p)/Ot; = 0 for i = 1,2,...,n. It is a well-known fact that if f has 
a local extremum at a point p € R”, then p is a critical point of f. For, if f 
has a local extremum at p = (pi, p2,.--,Pn), then for any 7 = 1,2,...,n the 
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function ¢; defined by ¢;(t) = f(p1,po,.--,Pi—1, t, Pit1;---)Pn) has a local 
extremum at t = p;. So, by an elementary single-variable argument, 


Of (p) = do;(p;) _ 
Ca 


Thus p is a critical point of f. But critical points are not necessarily local 
extrema. 

The second-order partial derivatives of f at a critical point p can often 
be used to test for a local extremum at p. These partials determine a matrix 
A(p) in which the row 7, column j entry is 


O° f (p) 
(Ot;)(Ot;) ” 


This matrix is called the Hessian matrix of f at p. Note that if the third- 
order partial derivatives of f are continuous, then the mixed second-order 
partials of f at p are independent of the order in which they are taken, and 
hence A(p) is a symmetric matrix. In this case, all of the eigenvalues of A(p) 
are real. 


Theorem 6.37 (The Second Derivative Test). Let f(t1,t2,...,tn) 
be a real-valued function in n real variables for which all third-order partial 
derivatives exist and are continuous. Let p = (pi,p2,-..,;Pn) be a critical 
point of f, and let A(p) be the Hessian of f at p. 

(a) If all eigenvalues of A(p) are positive, then f has a local minimum at p. 
(b) If all eigenvalues of A(p) are negative, then f has a local maximum at p. 
(c) If A(p) has at least one positive and at least one negative eigenvalue, 
then f has no local extremum at p (p is called a saddle-point of f). 
(d) If rank(A(p)) <n and A(p) does not have both positive and negative 
eigenvalues, then the second derivative test is inconclusive. 


Proof. If p 4 0, we may define a function g: R” > R by 


g(ti, ta,... itn) = f(t + pi, te + Ppa,- .+3Pn + tn) —_ f(p). 
The following facts are easily verified. 


1. The function f has a local maximum [minimum] at p if and only if g 
has a local maximum [minimum] at 0 = (0,0,...,0). 

2. The partial derivatives of g at 0 are equal to the corresponding partial 
derivatives of f at p. 


3. 0 is a critical point of g. 
d°9(0) 
4, Aj;(p) = ———— _ for alli and j. 
: (Ot;)(Ot;) 
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In view of these facts, we may assume without loss of generality that p = 0 
and f(p) = 0. 

Now we apply Taylor’s theorem to f to obtain the first-order approxima- 
tion of f around 0. We have 


ti, +S (ty ta, 1+., th) 


P(tytay ste) =£0)+ 90 Oti)(Ot,) * 


ij=l 
(13) 
where S is a real-valued function on R” such that 
ti, to,...,tn 
S(x) a lim a aie ’ ) —0. (14) 
a0 ||a||? — (t1,t2,..,trn)oo tf +t§+---+2 
Let kK: R” — R be the quadratic form defined by 
ty 
to) 1 Sy 8 s(0) 
K == ———_—~ t;it;, 15 
[72 oanony ee 
tn 


HT be the symmetric bilinear form corresponding to K, and (@ be the standard 
ordered basis for R”. It is easy to verify that Wg(H) = $A(p). Since A(p) 
is symmetric, Theorem 6.20 (p. 384) implies that there exists an orthogonal 
matrix Q such that 


AG. 30 0 

: O° 3k, 0 
Q’A(p)Q = 

0 0 be 


is a diagonal matrix whose diagonal entries are the eigenvalues of A(p). Let 
y = {v1,v2,-.-,Un} be the orthogonal basis for R” whose ith vector is the 
ith column of Q. Then @ is the change of coordinate matrix changing y- 
coordinates into 6-coordinates, and by Theorem 6.33 


: i 
r2 
0 oo ee 0 


U4(H) = Q'vs(H)Q = 5Q'A(D)Q = 
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Suppose that A(p) is not the zero matrix. Then A(p) has nonzero eigen- 
values. Choose « > 0 such that « < |A;|/2 for all A; # 0. By (14), there 
exists 6 > 0 such that for any « € R” satisfying 0 < ||z|| < 6, we have 
|S(a)| < €||x||?. Consider any x € R” such that 0 < ||z|| < 6. Then, by (13) 
and (15), 


|f(z) — K(a)| = |S(2)| < ella|/?, 
and hence 


K(x) — ¢llx||? < f(a) < K(x) + ¢le||?. (16) 


n 


Suppose that 7 = S- 5;v;. Then 
i=l 


|||? = de 2 and K(x =3 Mt 


Combining these equations with (16), we obtain 


y (5% i‘ c) st sy < f(x <d & + c) 8? (17) 


i=l 


Now suppose that all eigenvalues of A(p) are positive. Then Xi —e>0 
for all 7, and hence, by the left inequality in (17), 


0) =0<)) (5x.-«) BS Fn) 


Thus f(0) < f(x) for ||z|| < 6, and so f has a local minimum at 0. By a 
similar argument using the right inequality in (17), we have that if all of the 
eigenvalues of A(p) are negative, then f has a local maximum at 0. This 
establishes (a) and (b) of the theorem. 

Next, suppose that A(p) has both a positive and a negative eigenvalue, 
say, A; > 0 and A; < 0 for some 7 and j. Then $i —e>0Oand $rj +e<0. 
Let s be any real number such that 0 < |s| < 6. Substituting « = sv; and 
x = sv; into the left inequality and the right inequality of (17), respectively, 
we obtain 


f(0)=0< (5X —e)s* < f(sv;) and f(sv;) < (5A; +.e)s? <0= f(0). 


Thus f attains both positive and negative values arbitrarily close to 0; so f 
has neither a local maximum nor a local minimum at 0. This establishes (c). 
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To show that the second-derivative test is inconclusive under the condi- 
tions stated in (d), consider the functions 


fits) =H -) and gf) = +6 


at p= 0. In both cases, the function has a critical point at p, and 


Aw) = (5 9): 


However, f does not have a local extremum at 0, whereas g has a local 
minimum at 0. | 


Sylvester’s Law of Inertia 


Any two matrix representations of a bilinear form have the same rank 
because rank is preserved under congruence. We can therefore define the 
rank of a bilinear form to be the rank of any of its matrix representations. 
If a matrix representation is a diagonal matrix, then the rank is equal to the 
number of nonzero diagonal entries of the matrix. 

We confine our analysis to symmetric bilinear forms on finite-dimensional 
real vector spaces. Each such form has a diagonal matrix representation in 
which the diagonal entries may be positive, negative, or zero. Although these 
entries are not unique, we show that the number of entries that are positive 
and the number that are negative are unique. That is, they are independent 
of the choice of diagonal representation. This result is called Sylvester’s law 
of inertia. We prove the law and apply it to describe the equivalence classes 
of congruent symmetric real matrices. 


Theorem 6.38 (Sylvester’s Law of Inertia). Let H be a symmetric 
bilinear form on a finite-dimensional real vector space V. Then the number of 
positive diagonal entries and the number of negative diagonal entries in any 
diagonal matrix representation of H are each independent of the diagonal 
representation. 


Proof. Suppose that @ and ¥ are ordered bases for V that determine di- 
agonal representations of H. Without loss of generality, we may assume that 
@ and ¥ are ordered so that on each diagonal the entries are in the order 
of positive, negative, and zero. It suffices to show that both representations 
have the same number of positive entries because the number of negative en- 
tries is equal to the difference between the rank and the number of positive 
entries. Let p and q be the number of positive diagonal entries in the matrix 
representations of H with respect to @ and ¥, respectively. We suppose that 
p#qand arrive at a contradiction. Without loss of generality, assume that 
p<q. Let 


B= {v1, ¥2,---,Upy. ++, Ury-++, Un} and y = {wi1, We, ...,Wq,-+-; Wr, +++, Wn}, 
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where r is the rank of H and n = dim(V). Let L: V > R?t"~4 be the mapping 
defined by 


L(2) = (A(a,v1), (2, v2),...,H (2, vp), H(@, wq41),---, A (2, wr)). 
It is easily verified that L is linear and rank(L) < p+ r—q. Hence 
nullity(L) >n-—(p+r—q)>n-r. 


So there exists a nonzero vector vp such that v9 ¢ span({vp41, Ur+2,---;Un}), 
but vo € N(L). Since vp € N(L), it follows that H(vo,v;) = 0 for i < p and 
H(vo,w;) = 0 for g<i<r. Suppose that 


n n 
Up = ; aj;V; = ; bjw;. 
j=l j=l 
For any 7 < p, 
H(v0, vi) = bE Aj U;, Vi 2 aj H(v;,u;) = a H(v, vi). 


But for i < p, we have H(v;,v;) > 0 and H(vo,v;) = 0, so that a; = 
0. Similarly, b; = 0 forg+1<i< _r. Since vg is not in the span of 


{Up41, Ur42,--+,Un}, it follows that a; 4 0 for some p <i<r. Thus 
Tr 
H(v9, v0) = (Samm. “Le a? H(v;,v;)= SS a5 H(v;,0;) <0. 
J=ptl 
Furthermore, 
tT 
H (v0, v0) = a(S woh Wi - Lan (w;,wj)= S- bo H(w;, wj) >0. 
j=pt+l 


So H(vo, vo) < 0 and H(vp, v9) > 0, which is a contradiction. We conclude 
that p = q. i 


Definitions. The number of positive diagonal entries in a diagonal 
representation of a symmetric bilinear form on a real vector space is called 
the index of the form. The difference between the number of positive and 
the number of negative diagonal entries in a diagonal representation of a 
symmetric bilinear form is called the signature of the form. The three terms 
rank, index, and signature are called the invariants of the bilinear form 
because they are invariant with respect to matrix representations. These 
same terms apply to the associated quadratic form. Notice that the values of 
any two of these invariants determine the value of the third. 


Sec. 6.8 Bilinear and Quadratic Forms 445 


Example 10 


The bilinear form corresponding to the quadratic form K of Example 9 has 
a 3 x 3 diagonal matrix representation with diagonal entries of 2, 7, and 0. 
Therefore the rank, index, and signature of K are each2. 


Example 11 


The matrix representation of the bilinear form corresponding to the quadratic 
form K (x,y) = 2? — y? on R? with respect to the standard ordered basis is 
the diagonal matrix with diagonal entries of 1 and —1. Therefore the rank of 
K is 2, the index of K is 1, and the signature of K is0. 


Since the congruence relation is intimately associated with bilinear forms, 
we can apply Sylvester’s law of inertia to study this relation on the set of real 
symmetric matrices. Let A be an n x n real symmetric matrix, and suppose 
that D and FE are each diagonal matrices congruent to A. By Corollary 3 
to Theorem 6.32, A is the matrix representation of the bilinear form H on 
R” defined by H(a,y) = x’ Ay with respect to the standard ordered basis for 
R”. Therefore Sylvester’s law of inertia tells us that D and EF have the same 
number of positive and negative diagonal entries. We can state this result as 
the matrix version of Sylvester’s law. 


Corollary 1 (Sylvester’s Law of Inertia for Matrices). Let A be 
a real symmetric matrix. Then the number of positive diagonal entries and 
the number of negative diagonal entries in any diagonal matrix congruent to 
A is independent of the choice of the diagonal matrix. 


Definitions. Let A be a real symmetric matrix, and let D be a diagonal 
matrix that is congruent to A. The number of positive diagonal entries of 
D is called the index of A. The difference between the number of positive 
diagonal entries and the number of negative diagonal entries of D is called 
the signature of A. As before, the rank, index, and signature of a matrix 
are called the invariants of the matrix, and the values of any two of these 
invariants determine the value of the third. 


Any two of these invariants can be used to determine an equivalence class 
of congruent real symmetric matrices. 


Corollary 2. Two real symmetric n x n matrices are congruent if and 
only if they have the same invariants. 


Proof. If A and B are congruent n x n symmetric matrices, then they are 
both congruent to the same diagonal matrix, and it follows that they have 
the same invariants. 

Conversely, suppose that A and B are n x n symmetric matrices with the 
same invariants. Let D and E be diagonal matrices congruent to A and B, 
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respectively, chosen so that the diagonal entries are in the order of positive, 
negative, and zero. (Exercise 23 allows us to do this.) Since A and B have 
the same invariants, so do D and E. Let p and r denote the index and the 
rank, respectively, of both D and E. Let d; denote the ith diagonal entry 
of D, and let Q be the n x n diagonal matrix whose ith diagonal entry q; is 
given by 


1 
os ia ifp<i<r 
1 ifr <i. 
Then Q'DQ = Jpr, where 
ds O O 
TNO es. 0 
O O O 


It follows that A is congruent to Jp,. Similarly, B is congruent to Jp, and 
hence A is congruent to B. 


The matrix J,, acts as a canonical form for the theory of real symmet- 
ric matrices. The next corollary, whose proof is contained in the proof of 
Corollary 2, describes the role of Jp,. 


Corollary 3. A real symmetric n x n matrix A has index p and rank r 
if and only if A is congruent to Jpr (as just defined). 


Example 12 
Let 
1 1 -3 1 2 1 1 0 1 
A=]-1 2 1], B={2 3 2], and C={0 1 2 
3 1 1 1 2 1 12 1 
We apply Corollary 2 to determine which pairs of the matrices A, B, and C 


are congruent. 


The matrix A is the 3 x 3 matrix of Example 6, where it is shown that 
A is congruent to a diagonal matrix with diagonal entries 1, 1, and —24. 
Therefore, A has rank 3 and index 2. Using the methods of Example 6 (it is 
not necessary to compute Q), it can be shown that B and C are congruent, 
respectively, to the diagonal matrices 


1 0 0 1 0 0O 
0 -1 0 and 0 1 0 
0 O -1 0 0 —4 
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It follows that both A and C have rank 3 and index 2, while B has rank 3 and 
index 1. We conclude that A and C are congruent but that B is congruent 
to neither AnorC. 


EXERCISES 


1. Label the following statements as true or false. 


(a) Every quadratic form is a bilinear form. 

(b) If two matrices are congruent, they have the same eigenvalues. 

(c) Symmetric bilinear forms have symmetric matrix representations. 

(d) Any symmetric matrix is congruent to a diagonal matrix. 

(e) The sum of two symmetric bilinear forms is a symmetric bilinear 
form. 

(f) Two symmetric matrices with the same characteristic polynomial 
are matrix representations of the same bilinear form. 

(g) There exists a bilinear form H such that H(x,y) 4 0 for all x and 
y. 

(h) If V is a vector space of dimension n, then dim(B(V)) = 2n. 

(i) Let H be a bilinear form on a finite-dimensional vector space V 
with dim(V) > 1. For any wz € V, there exists y € V such that 
y # 0, but H(z, y) = 0. 

(j) If A is any bilinear form on a finite-dimensional real inner product 
space V, then there exists an ordered basis 6 for V such that w¢(H) 
is a diagonal matrix. 


2. Prove properties 1, 2, 3, and 4 on page 423. 


3. (a) Prove that the sum of two bilinear forms is a bilinear form. 
(b) Prove that the product of a scalar and a bilinear form is a bilinear 
form. 
(c) Prove Theorem 6.31. 


4. Determine which of the mappings that follow are bilinear forms. Justify 
your answers. 


(a) Let V = C(0, 1] be the space of continuous real-valued functions on 
the closed interval [0,1]. For f,g € V, define 


H(f,9) = iE f(ig( tat. 


(b) Let V be a vector space over F’, and let J € B(V) be nonzero. 
Define H: V x V > F by 


H(a,y) =[J(2,y)]? forallz,y €V. 
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(c) Define H: Rx R—- R by H(t1, te) = ty + 2te. 

(d) Consider the vectors of R? as column vectors, and let H: R? > R 
be the function defined by H(x, y) = det(z, y), the determinant of 
the 2 x 2 matrix with columns x and y. 

(e) Let V be a real inner product space, and let H: V x V > R be the 
function defined by H(a,y) = (x,y) for x,y € V. 

(f) Let V be a complex inner product space, and let H: Vx V > C 
be the function defined by H(x,y) = (x,y) for x,y EV. 


Verify that each of the given mappings is a bilinear form. Then compute 
its matrix representation with respect to the given ordered basis (3. 


(a) H: R®? x R? = R, where 


ay by 
AH ag ; bo = a,b 2a1b2 t ab, a3b3 
a3 bs 
and 
1 1 0 
B=<{ol,{ o],]1 
1 -1 0 


(b) Let V = Mex2(R) and 


Ale oe och 


Define H: V x V > R by H(A, B) = tr(A)- tr(B). 

(c) Let G = {cost,sint,cos2t,sin2t}. Then ( is an ordered basis 
for V = span(@), a four-dimensional subspace of the space of all 
continuous functions on R. Let H: V x V — R be the function 
defined by H(f,9) = f(0) - 9"(0). 


Let H: R* > R be the function defined by 


ay by _ ay by 2 
H ( ; ey = a,b2 + 496, for (“:) ; eS ER’. 


(a) Prove that H is a bilinear form. 
(b) Find the 2 x2 matrix A such that H(a, y) = x‘ Ay for all x,y € R?. 


For a 2 x 2 matrix M with columns «x and y, the bilinear form H(M) = 
H(a,y) is called the permanent of M. 


Let V and W be vector spaces over the same field, and let T: V + W be 
a linear transformation. For any H € B(W), define T(H): Vx V—> F 


by T(H)(2,y) = H(T(x), T(y)) for all x,y € V. Prove the following 
results. 
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14. 


15. 


16. 


(a) If H € B(W), then T(H) € B(V). 
(b) T: B(W) = B(V) is a linear transformation. 
(c) If T is an isomorphism, then so is T. 


Assume the notation of Theorem 6.32. 


(a) Prove that for any ordered basis 3, wg is linear. 

(b) Let 6 be an ordered basis for an n-dimensional space V over F’, and 
let gg: V — F” be the standard representation of V with respect 
to 6. For A € Mnxn(F), define H: Vx V > F by H(a,y) = 
[da(x)]' A[da(y)]. Prove that H € B(V). Can you establish this as 
a corollary to Exercise 7? 

(c) Prove the converse of (b): Let H be a bilinear form on V. If 


A= ,e(H), then H(z, y) = [¢a(2)]' A[da(y)]. 


(a) Prove Corollary 1 to Theorem 6.32. 
(b) For a finite-dimensional vector space V, describe a method for 
finding an ordered basis for B(V). 


Prove Corollary 2 to Theorem 6.32. 
Prove Corollary 3 to Theorem 6.32. 
Prove that the relation of congruence is an equivalence relation. 


The following outline provides an alternative proof to Theorem 6.33. 


(a) Suppose that 6 and y are ordered bases for a finite-dimensional 
vector space V, and let Q be the change of coordinate matrix 
changing 7-coordinates to 3-coordinates. Prove that ¢3 = Lady, 
where ¢g and ¢, are the standard representations of V with respect 
to 6 and 4, respectively. 

(b) Apply Corollary 2 to Theorem 6.32 to (a) to obtain an alternative 
proof of Theorem 6.33. 


Let V be a finite-dimensional vector space and H € B(V). Prove that, 
for any ordered bases @ and ¥ of V, rank(wWg(H)) = rank(w,(#)). 


Prove the following results. 

(a) Any square diagonal matrix is symmetric. 

(b) Any matrix congruent to a diagonal matrix is symmetric. 
(c) the corollary to Theorem 6.35 


Let V be a vector space over a field F' not of characteristic two, and let 
H be asymmetric bilinear form on V. Prove that if K(a) = H(x, 2x) is 
the quadratic form associated with H, then, for all z,y € V, 


A(x,y) = 
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18. 


19. 


20. 


21. 
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For each of the given quadratic forms K on a real inner product space 
V, find a symmetric bilinear form H such that K(x) = H(a,<) for all 
x € V. Then find an orthonormal basis 6 for V such that ~g(H) is a 
diagonal matrix. 


(a) K: R? => R defined by K & = —2t7 + 4tte +8 
2 


(b) K: R? — R defined by K (7) = 7t? — 8tyte + 3 
ty 

(c) K:R°® — R defined by K | te | = 3¢7 + 3t3 + 343 — 2tits 
ts 


Let S be the set of all (t1, t2,t3) € R® for which 


St? + 342 4 312 — Qt tg +2 2(t) + ts) +1 =0, 


Find an orthonormal basis 3 for R® for which the equation relating 
the coordinates of points of S relative to (@ is simpler. Describe S 
geometrically. 


Prove the following refinement of Theorem 6.37(d). 


(a) If0 <rank(A) < nand A has no negative eigenvalues, then f has 
no local maximum at p. 

(b) If0<rank(A) <7 and A has no positive eigenvalues, then f has 
no local minimum at p. 


Prove the following variation of the second-derivative test for the case 
n = 2: Define 


p_ [PFO] [Pf@)] _ [2 F@)]° 
ot? ot3 Ot Ot, | - 

(a) If D>0 and 0? f(p)/dt? > 0, then f has a local minimum at p. 

(b) If D>0 and 0? f(p)/0t? < 0, then f has a local maximum at p. 

(c) If D <0, then f has no local extremum at p. 

(d) If D=0, then the test is inconclusive. 


Hint: Observe that, as in Theorem 6.37, D = det(A) = A1A2, where A; 
and 2 are the eigenvalues of A. 


Let A and F be in Mayn(F), with F an elementary matrix. In Sec- 
tion 3.1, it was shown that AF can be obtained from A by means of 
an elementary column operation. Prove that E‘A can be obtained by 
means of the same elementary operation performed on the rows rather 
than on the columns of A. Hint: Note that E‘A = (A‘E)'. 
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22. For each of the following matrices A with entries from R, find a diagonal 
matrix D and an invertible matrix Q such that Q*AQ = D. 


3 1 2 
@ (3) ® Ga) © (14 0 
2 0 -l 
Hint for (b): Use an elementary operation other than interchanging 


columns. 


23. Prove that if the diagonal entries of a diagonal matrix are permuted, 
then the resulting diagonal matrix is congruent to the original one. 


24. Let T be a linear operator on a real inner product space V, and define 
H:VxV—>Rby H(2,y) = (x, T(y)) for all x,y € V. 


(a) Prove that H is a bilinear form. 

(b) Prove that H is symmetric if and only if T is self-adjoint. 

(c) What properties must T have for H to be an inner product on V? 

(d) Explain why H may fail to be a bilinear form if V is a complex 
inner product space. 


25. Prove the converse to Exercise 24(a): Let V be a finite-dimensional real 
inner product space, and let H be a bilinear form on V. Then there 
exists a unique linear operator T on V such that H(a,y) = (x, T(y)) for 
all x,y € V. Hint: Choose an orthonormal basis ( for V, let A = we(H), 
and let T be the linear operator on V such that [T]g = A. Apply 
Exercise 8(c) of this section and Exercise 15 of Section 6.2 (p. 355). 


26. Prove that the number of distinct equivalence classes of congruent nx n 
real symmetric matrices is 


(n+ 1)(n +4 2) 
ae 


6.9* EINSTEIN’S SPECIAL THEORY OF RELATIVITY 


As a consequence of physical experiments performed in the latter half of the 
nineteenth century (most notably the Michelson—Morley experiment of 1887), 
physicists concluded that the results obtained in measuring the speed of light 
are independent of the velocity of the instrument used to measure the speed of 
light. For example, suppose that while on Earth, an experimenter measures 
the speed of light emitted from the sun and finds it to be 186,000 miles per 
second. Now suppose that the experimenter places the measuring equipment 
in a spaceship that leaves Earth traveling at 100,000 miles per second in a 
direction away from the sun. A repetition of the same experiment from the 
spaceship yields the same result: Light is traveling at 186,000 miles per second 
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relative to the spaceship, rather than 86,000 miles per second as one might 
expect! 

This revelation led to a new way of relating coordinate systems used to 
locate events in space-time. The result was Albert Einstein’s special theory 
of relativity. In this section, we develop via a linear algebra viewpoint the 
essence of Einstein’s theory. 


z z 
4 A eu 
y y' 
[ee eal . 
S x Ss’ x! 
Figure 6.8 


The basic problem is to compare two different inertial (nonaccelerating) 
coordinate systems S$ and S’ in three-space (R°) that are in motion relative 
to each other under the assumption that the speed of light is the same when 
measured in either system. We assume that S’ moves at a constant velocity 
in relation to S as measured from S'. (See Figure 6.8.) To simplify matters, 
let us suppose that the following conditions hold: 


1. The corresponding axes of S and S’ (a and 2’, y and y’, z and 2’) are 
parallel, and the origin of S’ moves in the positive direction of the z-axis 
of S at a constant velocity v > 0 relative to S. 

2. Two clocks C and C” are placed in space—the first stationary relative 
to the coordinate system S and the second stationary relative to the 
coordinate system 5”. These clocks are designed to give real numbers 
in units of seconds as readings. The clocks are calibrated so that at the 
instant the origins of S and S’ coincide, both clocks give the reading 
Zero. 

3. The unit of length is the light second (the distance light travels in 1 
second), and the unit of time is the second. Note that, with respect to 
these units, the speed of light is 1 light second per second. 


Given any event (something whose position and time of occurrence can be 
described), we may assign a set of space-time coordinates to it. For example, 
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if p is an event that occurs at position 


x 


y 
z 


relative to S' and at time t as read on clock C, we can assign to p the set of 
coordinates 


+rxXeE 8 


This ordered 4-tuple is called the space-time coordinates of p relative to 
S and C. Likewise, p has a set of space-time coordinates 


relative to S’ and C’. 
For a fixed velocity v, let T,: R* > R* be the mapping defined by 


x x 
/ 
Ts. Y)_ vy 
Zz Zz 
t t’ 
where 
x x! 
i 
y and 2 
2 Zz 
t tf 


are the space-time coordinates of the same event with respect to S and C 
and with respect to S’ and C’, respectively. 

Einstein made certain assumptions about T, that led to his special theory 
of relativity. We formulate an equivalent set of assumptions. 


Axioms of the Special Theory of Relativity 


(R. 1) The speed of any light beam, when measured in either coordinate system 
using a clock stationary relative to that coordinate system, is 1. 


454 Chap. 6 Inner Product Spaces 


(R 2) The mapping T,: R* — R¢ is an isomorphism. 


(R. 3) If 
x a! 
/ 
Ty : —= ie ’ 
t v 
then y’ = y and z’ = z. 
(R. 4) If 
x x x a" 
Yi _ y! Yo _ MN 
Ts; pe Ura Ie and T, a ae ae 
t Vd t te 


then 2” = 2’ and t” =’. 
(R.5) The origin of S moves in the negative direction of the x’-axis of S’ at 
the constant velocity —v < 0 as measured from 9”. 


Axioms (R. 3) and (R 4) tell us that for p € R*, the second and third coor- 
dinates of T,(p) are unchanged and the first and fourth coordinates of T,(p) 
are independent of the second and third coordinates of p. 

As we will see, these five axioms completely characterize T,. The operator 
Ty, is called the Lorentz transformation in direction x. We intend to 
compute T,, and use it to study the curious phenomenon of time contraction. 


Theorem 6.39. On R?, the following statements are true. 
) Ty(ei) = e; for i = 2,3. 
span({e2, e3}) is Ty-invariant. 


a 
b) 
) span({e1, e4}) is T,-invariant. 
) 
) 


( 
( 
(c 
(d 


(e 
Proof. (a) By axiom (R 2), 


Both span({e2, e3}) and span({e1, e4}) are T%-invariant. 
Tk(e;) =e; fori = 2,3. 


Ty 


SOOO) 
oCoCc eo 


Ty 


ome 
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are both zero for any a,b € R. Thus, by axiom (R 3), 


Ty 


= and T, = 


0 
0 
1 
0 


oOoroe 


0 0 
1 1 
0 0 
0 0 

The proofs of (b), (c), and (d) are left as exercises. 

(e) For any j 4 2, (T%(e2),e;) = (e2, Ty (e;)) = 0 by (a) and (c); for 7 = 2, 
(TS (e2),€;) = (€2, Tu(€2)) = (€2,€2) = 1 by (a). We conclude that T%(e2) is 
a multiple of eg (ie., that T*(e2) = keg for some k € R). Thus, 


1 = (€9, €2) = (e2, Ty(e2)) = (Tp (€2), €2) = (kez, €2) = k, 
and hence T*(e2) = eg. Similarly, T*(e3) = e3. i 
Suppose that, at the instant the origins of S and S$” coincide, a light 
flash is emitted from their common origin. The event of the light flash when 


measured either relative to S and C or relative to S’ and C” has space-time 
coordinates 


oCoCc eo 


Let P be the set of all events whose space-time coordinates 


eRe Y 


relative to S and C are such that the flash is observable from the point with 
coordinates 


x 


y 
Zz 


(as measured relative to S) at the time t (as measured on C). Let us charac- 
terize P in terms of x, y, z, and t. Since the speed of light is 1, at any time 
t > 0 the light flash is observable from any point whose distance to the origin 
of S (as measured on S) is t-1 =t. These are precisely the points that lie on 
the sphere of radius ¢ with center at the origin. The coordinates (relative to 
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S) of such points satisfy the equation x? + y? + z? — t? = 0. Hence an event 
lies in P if and only if its space-time coordinates 


(t > 0) 


e+ RE B 


relative to S and C satisfy the equation x? + y? + z* — t? = 0. By virtue of 
axiom (R 1), we can characterize P in terms of the space-time coordinates 
relative to S” and C” similarly: An event lies in P if and only if, relative to 
S’ and C’, its space-time coordinates 


a! 


/ 


u | (20) 


t! 


satisfy the equation (x’)? + (y’)? + (2’)? — (t’)? = 0. 
Let 


ooor 
=) 
oOoroeoe9& 
roo oO 


Theorem 6.40. If (L4(w),w) =0 for some w € R*, then 
(TpLaTy(w), w) = 0. 


Proof. Let 


eR, 


Se RE B 


and suppose that (L4(w), w) = 0. 
CasE 1. t¢>0. Since (L4(w),w) = 2? + y? + 27 —#?, the vector w gives 
the coordinates of an event in P relative to S and C’. Because 


/ 


/ 


@ 
5 
Qo. 


2! 


t! 


t+rxe s&s 
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are the space-time coordinates of the same event relative to S’ and C’, the 
discussion preceding Theorem 6.40 yields 
(2)? + (y')? + (2)? — (t')? = 0. 


Thus (T7LaTy(w), w) = (LaTo(w), To(w)) = (2')? + (y')? + (2)? — (#)? = 0, 
and the conclusion follows. 
CASE 2. t <0. The proof follows by applying case 1 to —w. | 


We now proceed to deduce information about T,. Let 


rPoOoOrF 


1 
Wi = ; and w2= 
1 


By Exercise 3, {wi,w2} is an orthogonal basis for span({e1,e4}), and 
span({e1, e4}) is T*L4T,-invariant. The next result tells us even more. 


Theorem 6.41. There exist nonzero scalars a and b such that 
(a) T2LaTy(wi) = awe. 
(b) T*LaTy(we) => bw. 


Proof. (a) Because (L4(w1),wi) = 0, (TSLaT,(wi),wi) = 0 by Theo- 
rem 6.40. Thus T%L4T,(w1) is orthogonal to w . Since span({e1,e4}) = 
span({w1, w2}) is TSL4T,-invariant, TSL4T,(wi) must lie in this set. But 
{w1, w2} is an orthogonal basis for this subspace, and so T¥L4T,(wi) must 
be a multiple of w2. Thus T*L4T,(w1) = awe for some scalar a. Since T, 
and A are invertible, so is T*L4T,. Thus a £ 0, proving (a). 

The proof of (b) is similar to (a). | 


Corollary. Let B, = [T.] g» Where (3 is the standard ordered basis for R4. 
Then 
(a) BX AB, = A. 
(b) T¥L4T, = La. 


We leave the proof of the corollary as an exercise. For hints, see Exercise 4. 

Now consider the situation 1 second after the origins of S and S$” have 
coincided as measured by the clock C. Since the origin of S” is moving along 
the x-axis at a velocity v as measured in S, its space-time coordinates relative 
to S and Care 


eo oe 
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Similarly, the space-time coordinates for the origin of S’ relative to S’ and 
C’ must be 


oC °o 


ck 
4 


for some t’ > 0. Thus we have 


for some t’ > 0. (18) 


By the corollary to Theorem 6.41, 


U U U U 
: o} fol\ | in Wa) os ee 
1 1 1 1 
But also 
U U U U 
: 0} fo 2 0 0 
(rum, 0 5) 0 ) ns (ut 0 she 0 ) 
1 1 1 1 
0 0 
_ 0 0 _ an 
af cas 


Combining (19) and (20), we conclude that v? — 1 = —(t’)?, or 


= V1—v?. (21) 
Thus, from (18) and (21), we obtain 


v 0 
0 0 
o|= 0 : (22) 
1 V1 — v2 

Next recall that the origin of S moves in the negative direction of the 
x-axis of S’ at the constant velocity —v < 0 as measured from S’. [This fact 


Ty 


Sec. 6.9 Einstein's Special Theory of Relativity 459 


is axiom (R 5).] Consequently, 1 second after the origins of S and S’ have 
coincided as measured on clock C, there exists a time t” > 0 as measured on 
clock C’ such that 


0 


0 
0 0 
0 
1 t"! 


From (23), it follows in a manner similar to the derivation of (22) that 


1 
{= —— 24 
Vine ee) 
hence, from (23) and (24), 
-v 
0 V1 —v 
0 0 
Ty o| = 0 : (25) 
1 1 


V1—v? 
The following result is now easily proved using (22), (25), and Theorem 6.39. 


Theorem 6.42. Let 3 be the standard ordered basis for R*. Then 


1 —v 
ee, eeiteer 
V1- v2 V1-v? 

0 1 0 0 


0 0 1 0 


—v 1 


(ie yer 


Time Contraction 


A most curious and paradoxical conclusion follows if we accept Einstein’s 
theory. Suppose that an astronaut leaves our solar system in a space vehicle 
traveling at a fixed velocity v as measured relative to our solar system. It 
follows from Einstein’s theory that, at the end of time t as measured on Earth, 
the time that passes on the space vehicle is only tv/1 — v?. To establish this 
result, consider the coordinate systems S and S’ and clocks C and C’ that 
we have been studying. Suppose that the origin of S’ coincides with the 
space vehicle and the origin of S coincides with a point in the solar system 
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(stationary relative to the sun) so that the origins of S and S’ coincide and 
clocks C and C’ read zero at the moment the astronaut embarks on the trip. 
As viewed from S, the space-time coordinates of the vehicle at any time 


t > 0 as measured by C' are 


ut 


0 
0 ’ 
t 


whereas, as viewed from S$’, the space-time coordinates of the vehicle at any 


time t’ > 0 as measured by C” are 


ooo 


But if two sets of space-time coordinates 


ut 


ooCoeo 


: and 
t 


cb 
4 


are to describe the same event, it must follow that 


ut 0 
0 0 
Te Oo}; |0 
t t! 
Thus 
1 0 —v 
V1 —v? V1 —v2 i 0 
0 1 0 0 0 0 
—v 0 0 1 
V1 —v? V1—v? 
From the preceding equation, we obtai ut f t,o 
rom r in, uation, w in t =Tt,or 
5 Peg Vl—v2 V1 =v? 
U=tV1—v?. 


(26) 
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This is the desired result. 

A dramatic consequence of time contraction is that distances are con- 
tracted along the line of motion (see Exercise 9). 

Let us make one additional point. Suppose that we consider units of 
distance and time more commonly used than the light second and second, 
such as the mile and hour, or the kilometer and second. Let c denote the 
speed of light relative to our chosen units of distance. It is easily seen that if 
an object travels at a velocity v relative to a set of units, then it is traveling 
at a velocity v/c in units of light seconds per second. Thus, for an arbitrary 
set of units of distance and time, (26) becomes 


/ 2 
1 U 


EXERCISES 
1. Prove (b), (c), and (d) of Theorem 6.39. 


2. Complete the proof of Theorem 6.40 for the case t < 0. 


3. For 
1 1 
0 0 
wi=| 9 and w2=] g|> 
1 —1 
show that 


(a) {w 1, w2} is an orthogonal basis for span({e1, e4}); 
(b) span({ei, e4}) is T*L4T,-invariant. 
4. Prove the corollary to Theorem 6.41. 
Hints: 
(a) Prove that 


p00 @q 
ee ooo), 
-—q 0 0 —p 
where 
poor po pe re 
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(b) Show that q = 0 by using the fact that B* AB, is self-adjoint. 
(c) Apply Theorem 6.40 to 


0 
bee 1 
~ 10 
1 
to show that p= 1. 
Derive (24), and prove that 
—v 

0 J/1 — v2 

0 0 
Ty ahs 0 ; (25) 

1 1 


Veae 
Hint: Use a technique similar to the derivation of (22). 


Consider three coordinate systems S', S’, and S” with the corresponding 
axes (x,2’,0"; y,y’,y”; and z,z’,z’’) parallel and such that the z-, 2’-, 
and «”’-axes coincide. Suppose that S’ is moving past S at a velocity 
v, > 0 (as measured on S$), S” is moving past S$” at a velocity v2 > 0 
(as measured on S$’), and S” is moving past S at a velocity v3 > 0 (as 
measured on S$), and that there are three clocks C, C’, and C” such 
that C' is stationary relative to S, C’ is stationary relative to S’, and 
C” is stationary relative to S’”. Suppose that when measured on any 
of the three clocks, all the origins of S, S’, and $” coincide at time 0. 
Assuming that T,, = To,T», (i.e., Bu; = Bo, By,), prove that 
UL + v2 
US: Sea 

1 + V1 V2 
Note that substituting vg = 1 in this equation yields v3 = 1. This tells 
us that the speed of light as measured in S or S’ is the same. Why 
would we be surprised if this were not the case? 


Compute (B,)~!. Show (B,)~! = By_y). Conclude that if S’ moves at 
a negative velocity v relative to S, then [Ty] 3 = Bo, where B, is of the 
form given in Theorem 6.42. 


Suppose that an astronaut left Earth in the year 2000 and traveled to 
a star 99 light years away from Earth at 99% of the speed of light and 
that upon reaching the star immediately turned around and returned 
to Earth at the same speed. Assuming Einstein’s special theory of 
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relativity, show that if the astronaut was 20 years old at the time of 
departure, then he or she would return to Earth at age 48.2 in the year 
2200. Explain the use of Exercise 7 in solving this problem. 


9. Recall the moving space vehicle considered in the study of time contrac- 
tion. Suppose that the vehicle is moving toward a fixed star located on 
the x-axis of S at a distance b units from the origin of S. If the space 
vehicle moves toward the star at velocity v, Earthlings (who remain “al- 
most” stationary relative to S) compute the time it takes for the vehicle 
to reach the star as t = b/v. Due to the phenomenon of time contraction, 
the astronaut perceives a time span of t/ = tV/1 — v2 = (b/v)V1 — v2. 
A paradox appears in that the astronaut perceives a time span incon- 
sistent with a distance of 6 and a velocity of v. The paradox is resolved 
by observing that the distance from the solar system to the star as 
measured by the astronaut is less than b. 


Assuming that the coordinate systems S and S’ and clocks C and C’ 
are as in the discussion of time contraction, prove the following results. 


(a) At time t (as measured on C), the space-time coordinates of star 
relative to S and C are 


a 


(b) At time t (as measured on C), the space-time coordinates of the 
star relative to S” and C’ are 


b—vt 


(c) For 
hse OE, oy phan, SD 


V1 — v2 V1 =v’ 
we have xz’ = bV/1 — v? — tv. 


This result may be interpreted to mean that at time ¢’ as measured by 
the astronaut, the distance from the astronaut to the star as measured 
by the astronaut (see Figure 6.9) is 


b/1—v2—tv. 
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S z 
A C + ce 
(2’, 0,0) 
, coordinates 
Lee y Y relative to S! 
> * (star) 
s x s' 
(b, 0,0) 
; coordinates 
Figure 6.9 relative to S 


(d) Conclude from the preceding equation that 
(1) the speed of the space vehicle relative to the star, as measured 
by the astronaut, is v; 
(2) the distance from Earth to the star, as measured by the astro- 
naut, is bV1 — v?. 
Thus distances along the line of motion of the space vehicle appear 
to be contracted by a factor of V1 — v?. 


6.10* CONDITIONING AND THE RAYLEIGH QUOTIENT 


In Section 3.4, we studied specific techniques that allow us to solve systems of 
linear equations in the form Ax = b, where A is an m x n matrix and 6b is an 
m X 1 vector. Such systems often arise in applications to the real world. The 
coefficients in the system are frequently obtained from experimental data, 
and, in many cases, both m and n are so large that a computer must be used 
in the calculation of the solution. Thus two types of errors must be considered. 
First, experimental errors arise in the collection of data since no instruments 
can provide completely accurate measurements. Second, computers introduce 
roundoff errors. One might intuitively feel that small relative changes in the 
coefficients of the system cause small relative errors in the solution. A system 
that has this property is called well-conditioned; otherwise, the system is 
called ill-conditioned. 


We now consider several examples of these types of errors, concentrating 
primarily on changes in b rather than on changes in the entries of A. In 
addition, we assume that A is a square, complex (or real), invertible matrix 
since this is the case most frequently encountered in applications. 
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Example 1 


Consider the system 


The solution to this system is 


(>): 


Now suppose that we change the system somewhat and consider the new 
system 


U+%= 5 
v1, — 22> 1.0001. 


This modified system has the solution 
3.00005 
1.99995 } ° 


We see that a change of 1074 in one coefficient has caused a change of less 
than 10~* in each coordinate of the new solution. More generally, the system 


ty +%=5 
1 —-%=1+6 


3+ 6/2 

2—6/2)° 
Hence small changes in b introduce small changes in the solution. Of course, 
we are really interested in relative changes since a change in the solution of, 


say, 10, is considered large if the original solution is of the order 107, but 
small if the original solution is of the order 10°. 


has the solution 


We use the notation 6b to denote the vector b’ — b, where b is the vector 
in the original system and Db’ is the vector in the modified system. Thus we 


T  65)-0-0) 


We now define the relative change in b to be the scalar ||6d||/||b||, where 
|| - || denotes the standard norm on C” (or R”); that is, ||b|| = \/(b, 0). Most 
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of what follows, however, is true for any norm. Similar definitions hold for 
the relative change in x. In this example, 


so] AL se Gt) - (2)) nL 
ol ~ 26 [al (0) V36 


Thus the relative change in x equals, coincidentally, the relative change in b; 
so the system is well-conditioned. 


Example 2 


Consider the system 


i - v2 3 


x; + 1.00001lz2 = 3.00001, 


() 


as its solution. The solution to the related system 


which has 


ey. t=3 


x1 + 1.00001x2 = 3.00001 + 6 


Hence, 


- = 10°,/2/5 || > 104|Al, 


while 


[sb | Wa 
[pl ~ 3v3 


Thus the relative change in x is at least 104 times the relative change in b! 
This system is very ill-conditioned. Observe that the lines defined by the two 
equations are nearly coincident. So a small change in either line could greatly 
alter the point of intersection, that is, the solution to the system. 


Sec. 6.10 Conditioning and the Rayleigh Quotient 467 


To apply the full strength of the theory of self-adjoint matrices to the 
study of conditioning, we need the notion of the norm of a matrix. (See 
Exercise 24 of Section 6.1 for further results about norms.) 


Definition. Let A be a complex (or real) n x n matrix. Define the 
(Euclidean) norm of A by 


A 
Al] = max 42 
nee Tal 


where x € C” or x € R”. 


Intuitively, || Al] represents the maximum magnification of a vector by the 
matrix A. The question of whether or not this maximum exists, as well as 
the problem of how to compute it, can be answered by the use of the so-called 
Rayleigh quotient. 


Definition. Let B be an n x n self-adjoint matrix. The Rayleigh 
quotient for x 4 0 is defined to be the scalar R(x) = (Bz, x) /||||*. 


The following result characterizes the extreme values of the Rayleigh quo- 
tient of a self-adjoint matrix. 


Theorem 6.43. For a self-adjoint matrix B © Mnxn(F'), we have that 
fos R(2) is the largest eigenvalue of B and ao R(x) is the smallest eigenvalue 


of B. 


Proof. By Theorems 6.19 (p. 384) and 6.20 (p. 384), we may choose an 
orthonormal basis {v1,v2,...,Un} of eigenvectors of B such that Bu; = d;0; 
(1 <i <n), where \} > Ag > ++: > An. (Recall that by the lemma to 
Theorem 6.17, p. 373, the eigenvalues of B are real.) Now, for « € F”, there 
exist scalars a1, d9,...,@, such that 


n 
od ) AzUi.- 
t=1 
Hence 


(Bz, 2) One a AVis j= a3; 


Ea I|x\|? 
Dea Adlas?? 2 Ad ia deal? — Aalfarl|? X 
— < — = Aj. 
I|x\|? I|x|I? I|x\|? 


It is easy to see that R(v1) = Ai, so we have demonstrated the first half of 
the theorem. The second half is proved similarly. 


468 Chap. 6 Inner Product Spaces 


Corollary 1. For any square matrix A, ||Al| is finite and, in fact, equals 
VX, where is the largest eigenvalue of A* A. 


Proof. Let B be the self-adjoint matrix A*A, and let \ be the largest 
eigenvalue of B. Since, for x 4 0, 
ae || Ax||? = (Ax, Ax) = (A* Aa, x) = (Bu, 2) SR: 


~ lel? I|a||? Ia]? [a]? 


it follows from Theorem 6.43 that ||A||? = 2. | 


Observe that the proof of Corollary 1 shows that all the eigenvalues of 
A* A are nonnegative. For our next result, we need the following lemma. 


Lemma. For any square matrix A, \ is an eigenvalue of A* A if and only 
if \ is an eigenvalue of AA*. 


Proof. Let \ be an eigenvalue of A* A. If \ = 0, then A*A is not invertible. 
Hence A and A* are not invertible, so that 2 is also an eigenvalue of AA”. 
The proof of the converse is similar. 

Suppose now that \ #0. Then there exists x # 0 such that A* Ax = Az. 
Apply A to both sides to obtain (AA*)(Ax) = A(Ax). Since Ax ¥ 0 (lest 
Ax = 0), we have that \ is an eigenvalue of AA*. The proof of the converse 
is left as an exercise. 


Corollary 2. Let A be an invertible matrix. Then ||A~+|| = 1/V), 
where A is the smallest eigenvalue of A* A. 


Proof. Recall that \ is an eigenvalue of an invertible matrix if and only if 
A~+ is an eigenvalue of its inverse. 

Now let Ay > A2 > +--+ > An be the eigenvalues of A*A, which by the 
lemma are the eigenvalues of AA*. Then || A~1||? equals the largest eigenvalue 
of (A~1)*A7+ = (AA*)~1, which equals 1/An. 


For many applications, it is only the largest and smallest eigenvalues that 
are of interest. For example, in the case of vibration problems, the smallest 
eigenvalue represents the lowest frequency at which vibrations can occur. 

We see the role of both of these eigenvalues in our study of conditioning. 


Example 3 
Let 


1 
A=|-1 
0 


er © 
ror 
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Then 
2 =1 1 
B=A*‘A=1|{-1 De. J 
1 Ly 2 


The eigenvalues of B are 3, 3, and 0. Therefore, || A|| = V3. For any 


a 
c= |b] 40, 
c 


we may compute R(x) for the matrix B as 


(Bx,z)  2(a? +b? +c? — ab+ac + bce) 
PS — = ‘ 
SG) ea +R +E . 


Now that we know ||A|| exists for every square matrix A, we can make use 
of the inequality || Az|| < || Al] - |||], which holds for every x. 

Assume in what follows that A is invertible, b 4 0, and Ax = b. For 
a given 6b, let dx be the vector that satisfies A(z + dx) = b+ 6b. Then 
A(éax) = 6b, and so 6a = A~1(6b). Hence 


\[b|| = Aa] < |All Jal] and ||6al] = || A7*(58)|| < |[A7* |] - [168]. 
Thus 


loxl] — Mell? AM * I 155 ]1 - All 1, (105 
< < = ||All A - 
lll] ~~ {IbI/IIAll [|| |[bI| 


Similarly (see Exercise 9), 


1 (tr) dell 
AN ATT A (eI 7 tell 


The number ||A|| - || A~1|| is called the condition number of A and is 
denoted cond(A). It should be noted that the definition of cond(A) depends 
on how the norm of A is defined. There are many reasonable ways of defining 
the norm of a matrix. In fact, the only property needed to establish the 
inequalities above is that ||Az|| < ||Al| - |x|] for all 2. We summarize these 
results in the following theorem. 


Theorem 6.44. For the system Ax = b, where A is invertible and b ¥ 0, 
the following statements are true. 


1 oll — [del 


conat ay ll 
cond(A) [fl] ~ [all ~ 


oT ik 


(a) For any norm || - ||, we have 
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(b) If||- || is the Euclidean norm, then cond(A) = \/Ai/An , where A, and 
An, are the largest and smallest eigenvalues, respectively, of A* A. 


Proof. Statement (a) follows from the previous inequalities, and (b) follows 
from Corollaries 1 and 2 to Theorem 6.43. ii 


It is clear from Theorem 6.44 that cond(A) > 1. It is left as an exercise 
to prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary or 
orthogonal matrix. Moreover, it can be shown with some work that equality 
can be obtained in (a) by an appropriate choice of b and 6b. 

We can see immediately from (a) that if cond(A) is close to 1, then a 
small relative error in b forces a small relative error in x. If cond(A) is large, 
however, then the relative error in x may be small even though the relative 
error in 0 is large, or the relative error in 7 may be large even though the 
relative error in b is small! In short, cond(A) merely indicates the potential 
for large relative errors. 

We have so far considered only errors in the vector b. If there is an error 
0A in the coefficient matrix of the system Ax = 8b, the situation is more 
complicated. For example, A+ 6A may fail to be invertible. But under the 
appropriate assumptions, it can be shown that a bound for the relative error 
in x can be given in terms of cond(A). For example, Charles Cullen (Charles 
G. Cullen, An Introduction to Numerical Linear Algebra, PWS Publishing 
Co., Boston 1994, p. 60) shows that if A+ 0A is invertible, then 


[oar 


|O All 
sls d(A 
jz +62] ~ °°" A) 


|All” 

It should be mentioned that, in practice, one never computes cond(A) 
from its definition, for it would be an unnecessary waste of time to compute 
A! merely to determine its norm. In fact, if a computer is used to find 
A7!, the computed inverse of A in all likelihood only approximates A~!, and 
the error in the computed inverse is affected by the size of cond(A). So we 
are caught in a vicious circle! There are, however, some situations in which 
a usable approximation of cond(A) can be found. Thus, in most cases, the 
estimate of the relative error in x is based on an estimate of cond(A). 


EXERCISES 


1. Label the following statements as true or false. 


(a) If Av = b is well-conditioned, then cond(A) is small. 

(b) If cond(A) is large, then Ax = 6 is ill-conditioned. 

(c) If cond(A) is small, then Ax = b is well-conditioned. 

(d) The norm of A equals the Rayleigh quotient. 

(e) The norm of A always equals the largest eigenvalue of A. 
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2. Compute the norms of the following matrices. 


©) © 9 fas 
‘ 


3. Prove that if B is symmetric, then ||B|| is the largest eigenvalue of B. 
4. Let A and A~! be as follows: 


6 13 —-17 6 -4 1 
A= 13 29 —38 and A-t'=|-4 11 7 
-17 -388 50 —l 7 5 


The eigenvalues of A are approximately 84.74, 0.2007, and 0.0588. 


(a) Approximate ||Al|, || A~1||, and cond(A). (Note Exercise 3.) 

(b) Suppose that we have vectors x and & such that Ax = b and 
|b — AZ| < 0.001. Use (a) to determine upper bounds for 
||z — A~+b|| (the absolute error) and ||z — A~1bj|/||A~+b]] (the rel- 


ative error). 


5. Suppose that x is the actual solution of Ax = 6 and that a computer 
arrives at an approximate solution %. If cond(A) = 100, |b|| = 1, and 
||b — A&|| = 0.1, obtain upper and lower bounds for ||a — £||/||z||. 


6. Let 
211 
B={1 21 
1 1 2 
Compute 


1 
R|—2], ||Bll, and cond(B). 
3 


7. Let B be asymmetric matrix. Prove that a R(x) equals the smallest 
x 


eigenvalue of B. 


8. Prove that if A is an eigenvalue of AA*, then A is an eigenvalue of A* A. 
This completes the proof of the lemma to Corollary 2 to Theorem 6.43. 


9. Prove that if A is an invertible matrix and Ax = b, then 


1 (it) ec Héell 
AI ATT A Me 7 ~ Teel 
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10. Prove the left inequality of (a) in Theorem 6.44. 


11. Prove that cond(A) = 1 if and only if A is a scalar multiple of a unitary 
or orthogonal matrix. 


12. (a) Let A and B be square matrices that are unitarily equivalent. 
Prove that || A|| = ||B]]. 
(b) Let T be a linear operator on a finite-dimensional inner product 
space V. Define 


ey a 
mo [al 


Prove that ||T|| = ||[T]s||, where G is any orthonormal basis for V. 
(c) Let V be an infinite-dimensional inner product space with an or- 
thonormal basis {v1,v2,...}. Let T be the linear operator on V 
such that T(v,) = kv~. Prove that ||T|| (defined in (b)) does not 


exist. 


The next exercise assumes the definitions of singular value and pseudoinverse 
and the results of Section 6.7. 


13. Let A be an n x n matrix of rank r with the nonzero singular values 
01 >02>-::>a,. Prove each of the following results. 


(a) ||All =o. 
1 
b) |/Atl| =—. 
(b) [At => 
(c) If A is invertible (and hence r = n), then cond(A) = a 
on 


6.11* THE GEOMETRY OF ORTHOGONAL OPERATORS 


By Theorem 6.22 (p. 386), any rigid motion on a finite-dimensional real inner 
product space is the composite of an orthogonal operator and a translation. 
Thus, to understand the geometry of rigid motions thoroughly, we must ana- 
lyze the structure of orthogonal operators. Such is the aim of this section. We 
show that any orthogonal operator on a finite-dimensional real inner product 
space is the composite of rotations and reflections. 

This material assumes familiarity with the results about direct sums de- 
veloped at the end of Section 5.2, and familiarity with the definition and 
elementary properties of the determinant of a linear operator defined in Ex- 
ercise 7 of Section 5.1. 


Definitions. Let T be a linear operator on a finite-dimensional real inner 
product space V. The operator T is called a rotation if T is the identity on 
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V or if there exists a two-dimensional subspace W of V, an orthonormal basis 
B = {x1,22} for W, and a real number 6 such that 


T(a1) = (cos6)z1 + (sin@)x2, T(x2) = (— sin #)x1 + (cos 6)xo, 


and T(y) = y for all y € W+. In this context, T is called a rotation of W 
about W+. The subspace W+ is called the axis of rotation. 


Rotations are defined in Section 2.1 for the special case that V = R?. 


Definitions. Let T be a linear operator on a finite-dimensional real 
inner product space V. The operator T is called a reflection if there exists 
a one-dimensional subspace W of V such that T(a) = —a for all x € W and 
T(y) = y for ally € W+. In this context, T is called a reflection of V about 
Wt. 


It should be noted that rotations and reflections (or composites of these) 
are orthogonal operators (see Exercise 2). The principal aim of this section 
is to establish that the converse is also true, that is, any orthogonal operator 
on a finite-dimensional real inner product space is the composite of rotations 
and reflections. 


Example 1 


A Characterization of Orthogonal Operators on a One-Dimensional Real In- 
ner Product Space 


Let T be an orthogonal operator on a one-dimensional inner product space 
V. Choose any nonzero vector z in V. Then V = span({x}), and so T(x) = Ax 
for some A € R. Since T is orthogonal and \ is an eigenvalue of T, A = +1. 
If \ = 1, then T is the identity on V, and hence T is a rotation. If \ = —1, 
then T(x) = —a for all z € V; so T is a reflection of V about Vt = {0}. Thus 
T is either a rotation or a reflection. Note that in the first case, det(T) = 1, 
and in the second case, det(T)=—l. 


Example 2 
Some Typical Reflections 

(a) Define T: R? — R? by T(a,b) = (—a,b), and let W = span({e;}). 
Then T(x) = —a for all x € W, and T(y) = y for all y € W+. Thus T isa 
reflection of R? about W+ = span({e2}), the y-axis. 

(b) Let T: R? — R® be defined by T(a,b,c) = (a,b,—c), and let W = 
span({e3}). Then T(x) = —2 for all x € W, and T(y) = y for all ye Wt = 
span({e1, e2}), the xy-plane. Hence T is a reflection of R? about W+. 


Example 1 characterizes all orthogonal operators on a one-dimensional 
real inner product space. The following theorem characterizes all orthogonal 
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operators on a two-dimensional real inner product space V. The proof fol- 
lows from Theorem 6.23 (p. 387) since all two-dimensional real inner product 
spaces are structurally identical. For a rigorous justification, apply Theo- 
rem 2.21 (p. 104), where @ is an orthonormal basis for V. By Exercise 15 of 
Section 6.2, the resulting isomorphism ¢g: V — R? preserves inner products. 
(See Exercise 8.) 


Theorem 6.45. Let T be an orthogonal operator on a two-dimensional 
real inner product space V. Then T is either a rotation or a reflection. Fur- 
thermore, T is a rotation if and only if det(T) = 1, and T is a reflection if 
and only if det(T) = —-1. 


A complete description of the reflections of R? is given in Section 6.5. 


Corollary. Let V be a two-dimensional real inner product space. The 
composite of a reflection and a rotation on V is a reflection on V. 


Proof. If T, is a reflection on V and Tz is a rotation on V, then by 
Theorem 6.45, det(T1) = 1 and det(T2) = —1. Let T = T2T, be the 
composite. Since Tz and T, are orthogonal, so is T. Moreover, det(T) = 
det(T2)- det(T,) = —1. Thus, by Theorem 6.45, T is a reflection. The proof 
for T,T2 is similar. | 


We now study orthogonal operators on spaces of higher dimension. 


Lemma. If T is a linear operator on a nonzero finite-dimensional real 
vector space V, then there exists a T-invariant subspace W of V such that 
1 < dim(W) < 2. 


Proof. Fix an ordered basis 6 = {y1, y2,---,Yn} for V, and let A = [T],. 
Let ¢g: V — R” be the linear transformation defined by ¢g(yi) = ex for 
2 = 1,2,...,n. Then ¢g is an isomorphism, and, as we have seen in Sec- 
tion 2.4, the diagram in Figure 6.10 commutes, that is, Lagdg = dgT. As a 
consequence, it suffices to show that there exists an L4-invariant subspace Z 
of R” such that 1 < dim(Z) < 2. If we then define W = ¢3'(Z), it follows 
that W satisfies the conclusions of the lemma (see Exercise 13). 


We ay 


las, la 


R” La R” 


Figure 6.10 
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The matrix A can be considered as an n x n matrix over C' and, as such, 
can be used to define a linear operator U on C” by U(v) = Av. Since U 
is a linear operator on a finite-dimensional vector space over C, it has an 
eigenvalue A € C. Let x € C” be an eigenvector corresponding to 4. We may 
write A = Ay +7A2, where A, and 2 are real, and 


ay + iby 

ag + bg 
t= ; 

An + iby, 


where the a,’s and 0,’s are real. Thus, setting 


ay by 
a2 bo 

ry, = . and 2 = 3 5 
an br 


we have © = x1 + 1x2, where x; and 22 have real entries. Note that at least 
one of x1 or £2 is nonzero since x 4 0. Hence 


U(a) => AL => (Ax + tA2) (x1 + ix2) => (A121 2X2) t i(A1 x2 + A221). 
Similarly, 
U(a) = A(a, + tag) = Ax, +7Azo. 


Comparing the real and imaginary parts of these two expressions for U(x), 
we conclude that 


Av, = A{L1 = A2x2 and Axo = A122 + A2Qz]. 


Finally, let Z = span({x1,22}), the span being taken as a subspace of R”. 
Since x, # 0 or x2 4 0, Z is a nonzero subspace. Thus 1 < dim(Z) < 2, and 
the preceding pair of equations shows that Z is Ly-invariant. 


Theorem 6.46. Let T be an orthogonal operator on a nonzero finite- 
dimensional real inner product space V. Then there exists a collection of 
pairwise orthogonal T-invariant subspaces {W1,Wo2,...,Wm} of V such that 

(a) 1<dim(W;) <2 fori =1,2,...,m. 
(b) V=W, @Wo2--- OW. 
Proof. The proof is by mathematical induction on dim(V). If dim(V) = 1, 


the result is obvious. So assume that the result is true whenever dim(V) < n 
for some fixed integer n > 1. 
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Suppose dim(V) = n. By the lemma, there exists a T-invariant subspace 
W, of V such that 1 < dim(W) < 2. If W,; = V, the result is established. 
Otherwise, Wi # {0}. By Exercise 14, W} is T-invariant and the restriction 
of T to W} is orthogonal. Since dim(Wy) < n, we may apply the induc- 
tion hypothesis to Twe and conclude that there exists a collection of pair- 


wise orthogonal T-invariant subspaces {W,,W2,...,W)»,} of Wi such that 
1 < dim(W,) < 2 for i = 2,3,...,m and Wy = W2 @ W3 ©:-- ® Win. 
Thus {W,,Wo2,...,Wm} is pairwise orthogonal, and by Exercise 13(d) of 
Section 6.2, 


V=W, 6 Wi = W, @ Wo @-:+ @ Wm. | 


Applying Example 1 and Theorem 6.45 in the context of Theorem 6.46, 
we conclude that the restriction of T to W; is either a rotation or a reflection 
for each 7 = 2,3,...,m. Thus, in some sense, T is composed of rotations and 
reflections. Unfortunately, very little can be said about the uniqueness of the 
decomposition of V in Theorem 6.46. For example, the W,’s, the number m 
of W,’s, and the number of W;’s for which Tw, is a reflection are not unique. 
Although the number of W,’s for which Tw, is a reflection is not unique, 
whether this number is even or odd is an intrinsic property of T. Moreover, 
we can always decompose V so that Tw, is a reflection for at most one W,. 
These facts are established in the following result. 


Theorem 6.47. Let T, V, Wi, ... , Wm be as in Theorem 6.46. 

(a) The number of W,’s for which Tw, is a reflection is even or odd according 
to whether det(T) = 1 or det(T) = —1. 

(b) It is always possible to decompose V as in Theorem 6.46 so that the 
number of W;’s for which Tw, is a reflection is zero or one according to 
whether det(T) = 1 or det(T) = —1. Furthermore, if Tw, is a reflection, 
then dim(W,;) = 1. 


Proof. (a) Let r denote the number of W,’s in the decomposition for which 
Tw, is a reflection. Then, by Exercise 15, 


det(T) = det(Tw, )+ det(Tw,)+ +--+ + det(Tw,,) = (-1)’, 


proving (a). 

(b) Let E = {a € V: T(x) = —a}; then E is a T-invariant subspace 
of V. If W = E+, then W is T-invariant. So by applying Theorem 6.46 
to Tw, we obtain a collection of pairwise orthogonal T-invariant subspaces 
{W1,We,...,Wz} of W such that W = W; 6 W2 @--- @ Wy and for 1 < 
i < k, the dimension of each W, is either 1 or 2. Observe that, for each 
i= 1,2,...,k, Tw, is a rotation. For otherwise, if Tw, is a reflection, there 
exists a nonzero x« € W, for which T(#) = —x. But then, x © W;NE C 
E+ ME = {0}, a contradiction. If E = {0}, the result follows. Otherwise, 
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choose an orthonormal basis 3 for E containing p vectors (p > 0). It is 
possible to decompose ( into a pairwise disjoint union 3 = 3; U G2 U---UG, 
such that each @; contains exactly two vectors for i < r, and (, contains 
two vectors if p is even and one vector if p is odd. For each i = 1,2,...,7r, 
let We4i = span(;). Then, clearly, {W1,W2,...,We,...,We+r} is pairwise 
orthogonal, and 


V=W1 @ Wo @---O We @:*: @ War. (27) 


Moreover, if any @; contains two vectors, then 


—1 0 
det(Tw,.,) _ det([Tw,.+:]2;) = det ( 0 = =1. 


So Tw,,; i8 a rotation, and hence Tw, is a rotation for 7 < k+r. If 6, 
consists of one vector, then dim(W;+4,;) = 1 and 


det(Tw,.,,.) = det ([Tw,,,]3,.) = det(—1) =-l. 


Thus Tw,,,. is a reflection by Theorem 6.46, and we conclude that the de- 
composition in (27) satisfies the condition of (b). | 


As a consequence of the preceding theorem, an orthogonal operator can 
be factored as a product of rotations and reflections. 


Corollary. Let T be an orthogonal operator on a finite-dimensional real 
inner product space V. Then there exists a collection {T1,T2,...,Tm} of 
orthogonal operators on V such that the following statements are true. 

(a) For each i, T; is either a reflection or a rotation. 
(b) For at most one i, T; is a reflection. 

(c) TiT; = T5 Ti for all i and 9: 

(d) T=T1T2--: Tm. 


(e) det(T) = 


1 if T; is a rotation for each i 
—1 otherwise. 


Proof. As in the proof of Theorem 6.47(b), we can write 
V=W,O@W26-:--O@Wn, 


where Tw, is a rotation for i < m. For each i = 1,2,...,m, define T;: V— V 
by 
Tiler ta Fs Fog) = 1 He Ft Be + (ay) Pei + om, 


where z; € W,; for all 7. It is easily shown that each T; is an orthogonal 
operator on V. In fact, T; is a rotation or a reflection according to whether 
Tw, is a rotation or a reflection. This establishes (a) and (b). The proofs 
of (c), (d), and (e) are left as exercises. (See Exercise 16.) 
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Example 3 
Orthogonal Operators on a Three-Dimensional Real Inner Product Space 


Let T be an orthogonal operator on a three-dimensional real inner product 
space V. We show that T can be decomposed into the composite of a rotation 
and at most one reflection. Let 


V=W,6W29::-OW, 


be a decomposition as in Theorem 6.47(b). Clearly, m = 2 or m = 3. 


If m = 2, then V = W; ®@ W2. Without loss of generality, suppose that 
dim(W ,) = 1 and dim(W2) = 2. Thus Tw, is a reflection or the identity on 
W), and Tw, is a rotation. Defining T; and T> as in the proof of the corollary 
to Theorem 6.47, we have that T = T,T2 is the composite of a rotation and 
at most one reflection. (Note that if Tw, is not a reflection, then T; is the 
identity on V and T = T2.) 


If m = 3, then V = W; @ Wz © Ws and dim(W,;) = 1 for all i. For each 
i, let T; be as in the proof of the corollary to Theorem 6.47. If Tw, is not a 
reflection, then T; is the identity on W;. Otherwise, T; is a reflection. Since 
Tw, is a reflection for at most one 7, we conclude that T is either a single 
reflection or the identity (a rotation). 


EXERCISES 


1. Label the following statements as true or false. Assume that the under- 
lying vector spaces are finite-dimensional real inner product spaces. 


(a) Any orthogonal operator is either a rotation or a reflection. 

(b) The composite of any two rotations on a two-dimensional space is 
a rotation. 

(c) The composite of any two rotations on a three-dimensional space 
is a rotation. 

(d) The composite of any two rotations on a four-dimensional space is 
a rotation. 

(e) The identity operator is a rotation. 

(f) The composite of two reflections is a reflection. 

(g) Any orthogonal operator is a composite of rotations. 

(h) For any orthogonal operator T, if det(T) = —1, then T is a reflec- 
tion. 

(i) Reflections always have eigenvalues. 

(j) Rotations always have eigenvalues. 


2. Prove that rotations, reflections, and composites of rotations and re- 
flections are orthogonal operators. 
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3. Let 
1 v8 
A= 2 , and B=(¢ 7 
v3 1 “a 
2 ) 


(a) Prove that Ly is a reflection. 

(b) Find the axis in R? about which L, reflects, that is, the subspace 
of R? on which Ly acts as the identity. 

(c) Prove that Lag and Lg, are rotations. 


4. For any real number 4, let 


Aa (°% o) sin @ 
~ \sind —cosd)’ 
(a) Prove that Ly is a reflection. 


(b) Find the axis in R? about which Ly reflects. 


5. For any real number ¢, define Ty = L4, where 


_ f{cosd —sing 
~ \sing  cos@)’ 
(a) Prove that any rotation on R? is of the form Ty for some ¢. 


(b) Prove that TgTy = T(g4 ) for any ¢,y € R. 
(c) Deduce that any two rotations on R? commute. 


6. Prove that the composite of any two rotations on R® is a rotation on 
R°. 


7. Given real numbers ¢ and wv), define matrices 


1 0 0 cosy -—sinw 0 
A= |0 cos@ —sing and B= | sinw cosw 0 
0 sing cos @ 0 0 1 


(a) Prove that L4 and Lg are rotations. 
(b) Prove that Lyg is a rotation. 
(c) Find the axis of rotation for Lag. 


8. Prove Theorem 6.45 using the hints preceding the statement of the 
theorem. 


9. Prove that no orthogonal operator can be both a rotation and a reflec- 
tion. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 
17. 


18. 
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Prove that if V is a two- or three-dimensional real inner product space, 
then the composite of two reflections on V is a rotation of V. 


Give an example of an orthogonal operator that is neither a reflection 
nor a rotation. 


Let V be a finite-dimensional real inner product space. Define T: V — V 
by T(#) = —a. Prove that T is a product of rotations if and only if 
dim(V) is even. 


Complete the proof of the lemma to Theorem 6.46 by showing that 
W = 3° (Z) satisfies the required conditions. 


Let T be an orthogonal [unitary] operator on a finite-dimensional real 
[complex] inner product space V. If W is a T-invariant subspace of V, 
prove the following results. 


(a) Tw is an orthogonal [unitary] operator on W. 

(b) W+ is a T-invariant subspace of V. Hint: Use the fact that Tw 
is one-to-one and onto to conclude that, for any y € W, T*(y) = 
T~*(y) € W. 

(c) Tw is an orthogonal [unitary] operator on W. 


Let T be a linear operator on a finite-dimensional vector space V, where 
V is adirect sum of T-invariant subspaces, say, V = W, @PW26---@Ws. 
Prove that det(T) = det(Tw, )- det(Tw,)+--- + det(Tw,)- 


Complete the proof of the corollary to Theorem 6.47. 


Let T be a linear operator on an n-dimensional real inner product space 
V. Suppose that T is not the identity. Prove the following results. 


(a) If is odd, then T can be expressed as the composite of at most 
one reflection and at most 4(n — 1) rotations. 

(b) If n is even, then T can be expressed as the composite of at most 
$n rotations or as the composite of one reflection and at most 


3(n — 2) rotations. 


Let V be a real inner product space of dimension 2. For any x,y € V 
such that « 4 y and ||z|| = ||y|| = 1, show that there exists a unique 
rotation T on V such that T(z) = y. 


INDEX OF DEFINITIONS FOR CHAPTER 6 


Adjoint of a linear operator 358 Bilinear form 422 
Adjoint of a matrix 331 Complex inner product space 332 
Axis of rotation 473 Condition number 469 


Chap. 6 Index of Definitions 


Congruent matrices 426 

Conjugate transpose (adjoint) of a 
matrix 331 

Critical point 439 

Diagonalizable bilinear form 428 

Fourier coefficients of a vector rela- 
tive to an orthonormal set 348 

Frobenius inner product 332 

Gram-Schmidt orthogonalization 
process 344 

Hessian matrix 440 

Index of a bilinear form 444 

Index of a matrix 445 

Inner product 329 

Inner product space 332 

Invariants of a bilinear form 444 

Invariants of a matrix 445 

Least squares line 361 

Legendre polynomials 346 

Local extremum 439 

Local maximum 439 

Local minimum 439 

Lorentz transformation 454 

Matrix representation of a bilinear 

form 424 

Minimal solution of a system of equa- 

tions 364 

Norm of a matrix 467 

Norm of a vector 333 

Normal matrix 370 

Normal operator 370 

Normalizing a vector 335 

Orthogonal complement of a subset 
of an inner product space 349 

Orthogonally equivalent 
matrices 384 

Orthogonal matrix 382 

Orthogonal operator 379 

Orthogonal projection 398 

Orthogonal projection on a subspace 
351 

Orthogonal subset of an inner prod- 
uct space 335 


481 


Orthogonal vectors 335 

Orthonormal basis 341 

Orthonormal set 335 

Penrose conditions 421 

Permanent of a2 x 2 matrix 448 

Polar decomposition of a matrix 
412 

Pseudoinverse of a linear transforma- 
tion 413 

Pseudoinverse of a matrix 414 

Quadratic form 433 

Rank of a bilinear form 443 

Rayleigh quotient 467 

Real inner product space 332 

Reflection 473 

Resolution of the identity operator 
induced by a linear transformation 
402 

Rigid motion 385 

Rotation 472 

Self-adjoint matrix 373 

Self-adjoint operator 373 

Signature of a form 444 

Signature of a matrix 445 

Singular value decomposition of a 
matrix 410 

Singular value of a linear transforma- 
tion 407 

Singular value of a matrix 410 

Space-time coordinates 453 

Spectral decomposition of a linear 
operator 402 

Spectrum of a linear operator 402 

Standard inner product 330 

Symmetric bilinear form 428 

Translation 386 

Trigonometric polynomial 399 

Unitarily equivalent matrices 384 

Unitary matrix 382 

Unitary operator 379 

Unit vector 335 


Canonical Forms 


7.1. The Jordan Canonical Form | 
7.2 The Jordan Canonical Form II 
7.3. The Minimal Polynomial 

7.4* The Rational Canonical Form 


A, we learned in Chapter 5, the advantage of a diagonalizable linear oper- 
ator lies in the simplicity of its description. Such an operator has a diagonal 
matrix representation, or, equivalently, there is an ordered basis for the un- 
derlying vector space consisting of eigenvectors of the operator. However, not 
every linear operator is diagonalizable, even if its characteristic polynomial 
splits. Example 3 of Section 5.2 describes such an operator. 

It is the purpose of this chapter to consider alternative matrix repre- 
sentations for nondiagonalizable operators. These representations are called 
canonical forms. There are different kinds of canonical forms, and their ad- 
vantages and disadvantages depend on how they are applied. The choice of a 
canonical form is determined by the appropriate choice of an ordered basis. 
Naturally, the canonical forms of a linear operator are not diagonal matrices 
if the linear operator is not diagonalizable. 

In this chapter, we treat two common canonical forms. The first of these, 
the Jordan canonical form, requires that the characteristic polynomial of 
the operator splits. This form is always available if the underlying field is 
algebraically closed, that is, if every polynomial with coefficients from the field 
splits. For example, the field of complex numbers is algebraically closed by 
the fundamental theorem of algebra (see Appendix D). The first two sections 
deal with this form. The rational canonical form, treated in Section 7.4, does 
not require such a factorization. 


7.1. THE JORDAN CANONICAL FORM | 


Let T be a linear operator on a finite-dimensional vector space V, and suppose 
that the characteristic polynomial of T splits. Recall from Section 5.2 that 
the diagonalizability of T depends on whether the union of ordered bases 
for the distinct eigenspaces of T is an ordered basis for V. So a lack of 
diagonalizability means that at least one eigenspace of T is too “small.” 
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In this section, we extend the definition of eigenspace to generalized 
eigenspace. From these subspaces, we select ordered bases whose union is 
an ordered basis 7 for V such that 


Ae Oe ace. 6 
Oo hg. stars 

Rh cael (eae 4 
Gh. I: Sows, ce 


where each O is a zero matrix, and each A; is a square matrix of the form 


(A) or 


A 1 0 0 0 
0 A 1 0 0 
0 0 0: Xr» 1 
0 0 0: 0 A 


for some eigenvalue A of T. Such a matrix A; is called a Jordan block 
corresponding to A, and the matrix [T]g is called a Jordan canonical form 
of T. We also say that the ordered basis @ is a Jordan canonical basis 
for T. Observe that each Jordan block A; is “almost” a diagonal matrix—in 
fact, [T]g is a diagonal matrix if and only if each A; is of the form ()). 


Example 1 


Suppose that T is a linear operator on C’, and 6 = {v1,v2,...,ug} is an 
ordered basis for C® such that 


ooooqoc;c”nw 
COCOOCOQConr 
ooooonNnro 
oOoCoCoONnocoe 
oC OoOWwNoocec 
OoOOWrFoOoOoOCc oO 
ooCcococooCc co 
et eee eee) 


is a Jordan canonical form of T. Notice that the characteristic polynomial 
of T is det(J — tI) = (t — 2)*(t — 3)?t?, and hence the multiplicity of each 
eigenvalue is the number of times that the eigenvalue appears on the diagonal 
of J. Also observe that v1,v4, v5, and v7 are the only vectors in @ that are 
eigenvectors of T. These are the vectors corresponding to the columns of J 
with no 1 above the diagonal entry. 
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In Sections 7.1 and 7.2, we prove that every linear operator whose charac- 
teristic polynomial splits has a Jordan canonical form that is unique up to the 
order of the Jordan blocks. Nevertheless, it is not the case that the Jordan 
canonical form is completely determined by the characteristic polynomial of 
the operator. For example, let T’ be the linear operator on C® such that 
[T’]g = J’, where @ is the ordered basis in Example 1 and 


J = 


ooococnoconrw 
oCOCOCOCON Oo 
Go OOCOCOONCoO 
GoOCOOCOOCONCCO oO 
aoCoOoOWwWwOoOCcC eo 
oCoOwWwoCoCOoCcCc eo 
ooococooccoco 
oooooccoco 


Then the characteristic polynomial of T’ is also (t — 2)*(t — 3)?t?. But the 
operator T’ has the Jordan canonical form J’, which is different from J, the 
Jordan canonical form of the linear operator T of Example 1. 

Consider again the matrix J and the ordered basis 3 of Example 1. Notice 
that T(v2) = v1 +2v2 and therefore, (T—2I)(v2) = v1. Similarly, (T—21)(v3) = 
v2. Since v, and v4 are eigenvectors of T corresponding to A = 2, it follows 
that (T — 21)3(v;) = 0 for i = 1,2,3, and 4. Similarly (T — 31)?(v;) = 0 for 
i = 5,6, and (T — 01)?(u,;) = 0 for i = 7,8. 

Because of the structure of each Jordan block in a Jordan canonical form, 
we can generalize these observations: If v lies in a Jordan canonical basis for 
a linear operator T and is associated with a Jordan block with diagonal entry 
A, then (T — Al)?(v) = 0 for sufficiently large p. Eigenvectors satisfy this 
condition for p= 1. 


Definition. Let T be a linear operator on a vector space V, and let be 
a scalar. A nonzero vector x in V is called a generalized eigenvector of T 
corresponding to if (T — Al)?(x) = 0 for some positive integer p. 


Notice that if x is a generalized eigenvector of T corresponding to A, and p 
is the smallest positive integer for which (T—Al)?(x) = 0, then (T—Al)?~1(z) 
is an eigenvector of T corresponding to . Therefore \ is an eigenvalue of T. 

In the context of Example 1, each vector in ( is a generalized eigenvector 
of T. In fact, v1, v2, v3 and v4 correspond to the scalar 2, v5 and vg correspond 
to the scalar 3, and v7 and vg correspond to the scalar 0. 

Just as eigenvectors lie in eigenspaces, generalized eigenvectors lie in “gen- 
eralized eigenspaces.” 


Definition. Let T be a linear operator on a vector space V, and let be 
an eigenvalue of T. The generalized eigenspace of T corresponding to 
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X, denoted Ky, is the subset of V defined by 
K, = {a €V: (T —Al)?(a) = 0 for some positive integer p}. 


Note that K, consists of the zero vector and all generalized eigenvectors 
corresponding to X. 

Recall that a subspace W of V is T-invariant for a linear operator T if 
T(W) CW. In the development that follows, we assume the results of Exer- 
cises 3 and 4 of Section 5.4. In particular, for any polynomial g(x), if W is 
T-invariant, then it is also g(T)-invariant. Furthermore, the range of a linear 
operator T is T-invariant. 


Theorem 7.1. Let T be a linear operator on a vector space V, and let 
be an eigenvalue of T. Then 
(a) K) is a T-invariant subspace of V containing E, (the eigenspace of T 
corresponding to X). 
(b) For any scalar 1 # X, the restriction of T — pl to Ky is one-to-one. 


Proof. (a) Clearly, 0 € Ky. Suppose that x and y are in Ky. Then there 

exist positive integers p and q such that 
(T — Al)?(x) = (T — Al)*(y) = 0. 
Therefore 
(T—ANPT4(e + y) = (T— Al? T(a) + (T — AlPTa(y) 

= (T— Al)7(0) + (T — Al)?(0) 

= 0, 
and hence x+y € Ky. The proof that Ky is closed under scalar multiplication 
is straightforward. 

To show that K is T-invariant, consider any x € K). Choose a positive 
integer p such that (T — Al)?(a) = 0. Then 

(T — Al)?T(a) = T(T — Al)? (ax) = T(0) = 0. 
Therefore T(x) € Ky. 

Finally, it is a simple observation that E, is contained in Ky. 

(b) Let 2 € Ky and (T — pl)(x) = 0. By way of contradiction, suppose 
that z 4 0. Let p be the smallest integer for which (T — Al)?(x) = 0, and let 
y = (T—Al)?-}(az). Then 

(T — Al)(y) = (T — Al?(x) = 9, 


and hence y € E,. Furthermore, 


(T = wl)(y) = (T — pl)(T — Al)P*(a) = (T— ANP — pl) (2) = 8, 


so that y € E,. But E,NE, = {0}, and thus y = 0, contrary to the 
hypothesis. So 2 = 0, and the restriction of T — pil to Ky is one-to-one. 
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Theorem 7.2. Let T be a linear operator on a finite-dimensional vector 
space V such that the characteristic polynomial of T splits. Suppose that 
is an eigenvalue of T with multiplicity m. Then 

(a) dim(K,) <m. 
(b) Ky, = N((T — AN™). 


Proof. (a) Let W = Ky, and let h(t) be the characteristic polynomial of Tw. 
By Theorem 5.21 (p. 314), h(t) divides the characteristic polynomial of T, and 
by Theorem 7.1(b), is the only eigenvalue of Tw. Hence h(t) = (—1)4(t—A)?, 
where d = dim(W), and d < m. 

(b) Clearly N((T — Al)™) C Ky. Now let W and h(t) be as in (a). Then 
h(Tw) is identically zero by the Cayley-Hamilton theorem (p. 317); therefore 
(T—Al)¢4(x) = 0 for all z € W. Since d < m, we have Ky CN((T—-Al)™). 


Theorem 7.3. Let T be a linear operator on a finite-dimensional vec- 
tor space V such that the characteristic polynomial of T splits, and let 
1, A2;---;Ax be the distinct eigenvalues of T. Then, for every x € V, there 
exist vectors v; € Ky,, 1 <i<k, such that 


L= V1 + Vg +++: + UR. 


Proof. The proof is by mathematical induction on the number & of dis- 
tinct eigenvalues of T. First suppose that k = 1, and let m be the multiplic- 
ity of Ay. Then (A; — t)™ is the characteristic polynomial of T, and hence 
(Ai — T)™ = To by the Cayley-Hamilton theorem (p.317). Thus V = K),, 
and the result follows. 

Now suppose that for some integer k > 1, the result is established when- 
ever T has fewer than k distinct eigenvalues, and suppose that T has k distinct 
eigenvalues. Let m be the multiplicity of A,;, and let f(t) be the characteristic 
polynomial of T. Then f(t) = (t — Ax)’ g(t) for some polynomial g(t) not 
divisible by (t — Ax). Let W = R((T — Agl)™). Clearly W is T-invariant. 
Observe that (T — Axl)” maps Ky, onto itself for i < k. For suppose that 
i < k. Since (T — A, l)™ maps Ky, into itself and A, # Xj, the restriction 
of T — Axl to Ky, is one-to-one (by Theorem 7.1(b)) and hence is onto. One 
consequence of this is that for i < k, Ky, is contained in W; hence , is an 
eigenvalue of Tw for i < k. 

Next, observe that A, is not an eigenvalue of Tw. For suppose that T(v) = 
Axv for some v € W. Then v = (T — Axl)’ (y) for some y € V, and it follows 
that 


0 = (T—Agl)(v) = (T— Akl)" (y). 


Therefore y € Ky,. So by Theorem 7.2, v = (T — Axl)’ (y) = 0. 
Since every eigenvalue of Ty is an eigenvalue of T, the distinct eigenvalues 
of Tw are 1, 2, ieee »Ak—1- 
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Now let « € V. Then (T — Axl) (x) € W. Since Tw has the & — 1 distinct 
eigenvalues 1, A2,--- ,Ax—1, the induction hypothesis applies. Hence there 
are vectors w; € K’y,, 1 <i < k—1, such that 


(T = Agl)™ (x) =Wy+ We+++: + We1- 


Since K’,, C Ky, for i < k and (T — A,l)™ maps Ky, onto itself for 1 < k, 
there exist vectors vu; € Ky, such that (T — Axl)" (v;) = w; for i < k. Thus 
we have 


(T — Axl) (x) = (T = Axl)” (01) + (T = Axl) (v2) + + (T= Ag l)™ (vn-1), 


and it follows that «— (v1 +v2g+--- + ug—1) € Ky,. Therefore there exists a 
vector uz € Ky, such that 


L=Vz~+Va+t-+: + UR. | 


The next result extends Theorem 5.9(b) (p. 268) to all linear operators 
whose characteristic polynomials split. In this case, the eigenspaces are re- 
placed by generalized eigenspaces. 


Theorem 7.4. Let T be a linear operator on a finite-dimensional vec- 
tor space V such that the characteristic polynomial of T splits, and let 
Ai, A2,---,Ax be the distinct eigenvalues of T with corresponding multiplici- 
ties M1,™Ma,...,Mz. For 1 <i <k, let @; be an ordered basis for K,,. Then 
the following statements are true. 

(b) B= 6, UB. U--+ U Bp is an ordered basis for V. 
(c) dim(K),) = m, for all i. 


Proof. (a) Suppose that x € 8; 8; C Ky, MK),, where i # j. By 
Theorem 7.1(b), T — A,l is one-to-one on Ky,, and therefore (T — Ajl)?(x) 4 0 
for any positive integer p. But this contradicts the fact that « € Ky,, and the 
result follows. 

(b) Let « € V. By Theorem 7.3, for 1 <i < k, there exist vectors vj; € Ky, 
such that x = vj + vo +-+--+ uz. Since each v; is a linear combination of 
the vectors of (;, it follows that x is a linear combination of the vectors of (3. 
Therefore 3 spans V. Let q be the number of vectors in @. Then dimV < gq. 
For each i, let d; = dim(K),). Then, by Theorem 7.2(a), 


k k 
q= So di < So mi = dim(V). 


i=1 i=1 


Hence g = dim(V). Consequently ( is a basis for V by Corollary 2 to the 
replacement theorem (p. 47). 
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k k 
(c) Using the notation and result of (b), we see that S- d; = S- m,;. But 
i=1 i=1 


d; < m; by Theorem 7.2(a), and therefore d; = m; for all i. | 


Corollary. Let T be a linear operator on a finite-dimensional vector space 
V such that the characteristic polynomial of T splits. Then T is diagonalizable 
if and only if E, = Ky for every eigenvalue X of T. 


Proof. Combining Theorems 7.4 and 5.9(a) (p. 268), we see that T is 
diagonalizable if and only if dim(E,) = dim(K,) for each eigenvalue A of T. 
But E, C Ky, and hence these subspaces have the same dimension if and only 
if they are equal. | 


We now focus our attention on the problem of selecting suitable bases for 
the generalized eigenspaces of a linear operator so that we may use Theo- 
rem 7.4 to obtain a Jordan canonical basis for the operator. For this purpose, 
we consider again the basis 3 of Example 1. We have seen that the first four 
vectors of @ lie in the generalized eigenspace Kz. Observe that the vectors in 
G@ that determine the first Jordan block of J are of the form 


{v1, v2, v3} = {(T = 21)?(v3), (T a 21) (v3), v3}. 


Furthermore, observe that (T —2l)3(v3) = 0. The relation between these vec- 
tors is the key to finding Jordan canonical bases. This leads to the following 
definitions. 


Definitions. Let T be a linear operator on a vector space V, and let x 
be a generalized eigenvector of T corresponding to the eigenvalue .. Suppose 
that p is the smallest positive integer for which (T — Al)?(a#) = 0. Then the 
ordered set 


{(T — Al)?-1(x), (T — Al)?-? (a), ...,(T — A(x), 2} 


is called a cycle of generalized eigenvectors of T corresponding to X. 
The vectors (T — Al)?~+(a) and x are called the initial vector and the end 
vector of the cycle, respectively. We say that the length of the cycle is p. 


Notice that the initial vector of a cycle of generalized eigenvectors of a 
linear operator T is the only eigenvector of T in the cycle. Also observe that 
if x is an eigenvector of T corresponding to the eigenvalue X, then the set {x} 
is a cycle of generalized eigenvectors of T corresponding to 2 of length 1. 

In Example 1, the subsets G; = {v1,v2,u3}, G2 = {vs}, 83 = {vs, v6}, 
and 34 = {v7,vg} are the cycles of generalized eigenvectors of T that occur 
in 8. Notice that @ is a disjoint union of these cycles. Furthermore, setting 
W; = span((;) for 1 <i < 4, we see that (; is a basis for W; and [Tw,]g, is 
the ith Jordan block of the Jordan canonical form of T. This is precisely the 
condition that is required for a Jordan canonical basis. 
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Theorem 7.5. Let T be a linear operator on a finite-dimensional vector 
space V whose characteristic polynomial splits, and suppose that (3 is a basis 
for V such that (@ is a disjoint union of cycles of generalized eigenvectors of 
T. Then the following statements are true. 

(a) For each cycle y of generalized eigenvectors contained in 3, W = span(7) 
is T-invariant, and [Tw]y is a Jordan block. 
(b) 6 is a Jordan canonical basis for V. 


Proof. (a) Suppose that y corresponds to A, 7 has length p, and x is the 
end vector of y. Then y = {v1,v2,...,Up}, where 


vi = (T—Al)?*(z) fori<p and vp=za. 
So 
(T — Al)\(v,) = (T— Al)?(x) = 0, 
and hence T(v,) = Av. For i > 1, 
(T — Al)(v;) = (T— ANP) (2) = HE. 


Therefore T maps W into itself, and, by the preceding equations, we see that 
[Tw], is a Jordan block. 

For (b), simply repeat the arguments of (a) for each cycle in @ in order to 
obtain [T]g. We leave the details as an exercise. | 


In view of this result, we must show that, under appropriate conditions, 
there exist bases that are disjoint unions of cycles of generalized eigenvectors. 
Since the characteristic polynomial of a Jordan canonical form splits, this is 
a necessary condition. We will soon see that it is also sufficient. The next 
result moves us toward the desired existence theorem. 


Theorem 7.6. Let T be a linear operator on a vector space V, and let 
X be an eigenvalue of T. Suppose that 71, 2,-..,Yq are cycles of generalized 
eigenvectors of T corresponding to X such that the initial vectors of the y;’s 


are distinct and form a linearly independent set. Then the y;’s are disjoint, 
q 


and their union y = U 7; is linearly independent. 
i=1 


Proof. Exercise 5 shows that the ¥;’s are disjoint. 

The proof that ¥ is linearly independent is by mathematical induction on 
the number of vectors in y. If this number is less than 2, then the result is 
clear. So assume that, for some integer n > 1, the result is valid whenever 
has fewer than n vectors, and suppose that y has exactly n vectors. Let W 
be the subspace of V generated by y. Clearly W is (T — Al)-invariant, and 
dim(W) <n. Let U denote the restriction of T — Al to W. 
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For each i, let y; denote the cycle obtained from +; by deleting the end 
vector. Note that if y; has length one, then y; = @. In the case that 7; 4 2, 
each vector of +; is the image under U of a vector in 7;, and conversely, every 


nonzero image under U of a vector of 7; is contained in yj. Let y/ = LU ; 


a 

Then by the last statement, 7’ generates R(U). Furthermore, y’ consists of 
n — q vectors, and the initial vectors of the y/’s are also initial vectors of 
the 7;’s. Thus we may apply the induction hypothesis to conclude that 7 is 
linearly independent. Therefore y’ is a basis for R(U). Hence dim(R(U)) = 
n—q. Since the q initial vectors of the +,;’s form a linearly independent set 
and lie in N(U), we have dim(N(U)) > g. From these inequalities and the 
dimension theorem, we obtain 


We conclude that dim(W) = n. Since y generates W and consists of n vectors, 
it must be a basis for W. Hence 7¥ is linearly independent. EB 


Corollary. Every cycle of generalized eigenvectors of a linear operator is 
linearly independent. 


Theorem 7.7. Let T be a linear operator on a finite-dimensional vector 
space V, and let \ be an eigenvalue of T. Then K) has an ordered basis con- 
sisting of a union of disjoint cycles of generalized eigenvectors corresponding 
to A. 


Proof. The proof is by mathematical induction on n = dim(K,). The 
result is clear for n = 1. So suppose that for some integer n > 1 the result is 
valid whenever dim(K)) <n, and assume that dim(K)) = n. Let U denote the 
restriction of T—Al to Ky. Then R(U) is a subspace of K) of lesser dimension, 
and R(U) is the space of generalized eigenvectors corresponding to A for the 
restriction of T to R(U). Therefore, by the induction hypothesis, there exist 
disjoint cycles 71, Yy2,..., Yq of generalized eigenvectors of this restriction, and 


q 
hence of T itself, corresponding to » for which y = U 7; is a basis for R(U). 


For 1 <i <q, the end vector of 7; is the image nder U of a vector v; € Ky, 
and so we can extend each 7; to a larger cycle 4; = y; U {v;} of generalized 
eigenvectors of T corresponding to . For 1 < i < q, let w; be the initial vector 
of 4; (and hence of +;). Since {w1, w2,..., Wg} is a linearly independent sub- 
set of Ey, this set can be extended to a basis {w1, wo,..., Wg, U1, U2,---,Us} 
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for Ey. Then 41, 42,---, Yq, {ui}, {uo},--., {us} are disjoint cycles of gener- 
alized eigenvectors of T corresponding to » such that the initial vectors of 
these cycles are linearly independent. Therefore their union ¥ is a linearly 
independent subset of K, by Theorem 7.6. 

We show that 7 is a basis for K,. Suppose that y consists of r = 
rank(U) vectors. Then ¥ consists of r+ q+ s vectors. Furthermore, since 
{W1, W2,..., Wg, U1, U2,-.-, Us} is a basis for E, = N(U), it follows that 
nullity(U) = q+ s. Therefore 


dim(K)) = rank(U) + nullity(U) =r+q+s. 


So ¥ is a linearly independent subset of K, containing dim(K)) vectors. It 
follows that ¥ is a basis for Ky. | 


The following corollary is immediate. 


Corollary 1. Let T be a linear operator on a finite-dimensional vec- 
tor space V whose characteristic polynomial splits. Then T has a Jordan 
canonical form. 


Proof. Let Ay, A2,.--,;Ax be the distinct eigenvalues of T. By Theorem 7.7, 
for each i there is an ordered basis (3; consisting of a disjoint union of cycles 
of generalized eigenvectors corresponding to A;. Let G = 6, UG2U---U Bg. 
Then, by Theorem 7.4(b), @ is an ordered basis for V. 


The Jordan canonical form also can be studied from the viewpoint of 
matrices. 


Definition. Let A © Myxn(F) be such that the characteristic polynomial 
of A (and hence of L,4) splits. Then the Jordan canonical form of A is 
defined to be the Jordan canonical form of the linear operator L4 on F”. 


The next result is an immediate consequence of this definition and Corol- 
lary 1. 


Corollary 2. Let A be ann xn matrix whose characteristic polynomial 
splits. Then A has a Jordan canonical form J, and A is similar to J. 


Proof. Exercise. i 


We can now compute the Jordan canonical forms of matrices and linear 
operators in some simple cases, as is illustrated in the next two examples. 
The tools necessary for computing the Jordan canonical forms in general are 
developed in the next section. 
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Example 2 
Let 
3 1 -2 
A= |-1 0 5] ¢€ Msx3(R). 
-1 -1 4 


To find the Jordan canonical form for A, we need to find a Jordan canonical 
basis for T = Ly. 


The characteristic polynomial of A is 


f(t) =det(A —t2) = —(t — 3)(¢ — 2)”. 


Hence A; = 3 and Ag = 2 are the eigenvalues of A with multiplicities 1 
and 2, respectively. By Theorem 7.4, dim(K,) = 1, and dim(K),) = 2. By 
Theorem 7.2, K,, = N(T—3l), and Ky, = N((T—2I)?). Since E,, = N(T—3l), 
we have that E,, = Ky,. Observe that (—1,2,1) is an eigenvector of T 
corresponding to A; = 3; therefore 


f=¢{ 2 


is a basis for Ky,. 


Since dim(K),) = 2 and a generalized eigenspace has a basis consisting of 
a union of cycles, this basis is either a union of two cycles of length 1 or a 
single cycle of length 2. The former case is impossible because the vectors in 
the basis would be eigenvectors—contradicting the fact that dim(E,,) = 1. 
Therefore the desired basis is a single cycle of length 2. A vector v is the end 
vector of such a cycle if and only if (A — 21)v 4 0, but (A — 21)?u = 0. It 
can easily be shown that 


1 =i 
28/41) 
=a 0 


is a basis for the solution space of the homogeneous system (A — 21)?” = 0. 
Now choose a vector v in this set so that (A — 2I)v # 0. The vector v = 
(—1,2,0) is an acceptable candidate for v. Since (A — 2I)u = (1, —3, —-1), we 
obtain the cycle of generalized eigenvectors 
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as a basis for K),. Finally, we take the union of these two bases to obtain 


—1 1 -1 
B=2,UB.= 2 ’ —3 ’ 2 ’ 
1 -1 0 


which is a Jordan canonical basis for A. Therefore, 
3/0 0 
J=[Tlg=| 0/2 1 
0;0 2 


is a Jordan canonical form for A. Notice that A is similar to J. In fact, 
J = Q7!AQ, where Q is the matrix whose columns are the vectors in (3. 


¢ 


Example 3 


Let T be the linear operator on P2(R) defined by T(g(x)) = —g(x) — g'(z). 
We find a Jordan canonical form of T and a Jordan canonical basis for T. 


Let ( be the standard ordered basis for P2(R). Then 


-1 -1 0 
Wes) Ost!) 5 
0 oO -1 


which has the characteristic polynomial f(t) = —(t + 1)°. Thus \ = —1 is 
the only eigenvalue of T, and hence K, = P2(R) by Theorem 7.4. So Gis a 
basis for K,. Now 


0 -1 O 
dim(E,) = 3—rank(A+J)=3-rank/0 0 -2]) =3-2=1. 
0 oO O 


Therefore a basis for K, cannot be a union of two or three cycles because 
the initial vector of each cycle is an eigenvector, and there do not exist two 
or more linearly independent eigenvectors. So the desired basis must consist 
of a single cycle of length 3. If 7 is such a cycle, then y determines a single 
Jordan block 


-1 1 0 
T=) Oo 1 2]; 
fit 


which is a Jordan canonical form of T. 


The end vector h(x) of such a cycle must satisfy (T + l)?(h(x)) 4 0. In 
any basis for Ky, there must be a vector that satisfies this condition, or else 
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no vector in K, satisfies this condition, contrary to our reasoning. Testing 
the vectors in 3, we see that h(x) = x? is acceptable. Therefore 


y= q(T +1)?(27), (T+ I @?), 27} = 12,24, 27} 
is a Jordan canonical basis for T. 4 


In the next section, we develop a computational approach for finding a 
Jordan canonical form and a Jordan canonical basis. In the process, we prove 
that Jordan canonical forms are unique up to the order of the Jordan blocks. 

Let T be a linear operator on a finite-dimensional vector space V, and sup- 
pose that the characteristic polynomial of T splits. By Theorem 5.11 (p. 278), 
T is diagonalizable if and only if V is the direct sum of the eigenspaces of T. 
If T is diagonalizable, then the eigenspaces and the generalized eigenspaces 
coincide. The next result, which is optional, extends Theorem 5.11 to the 
nondiagonalizable case. 


Theorem 7.8. Let T be a linear operator on a finite-dimensional vector 
space V whose characteristic polynomial splits. Then V is the direct sum of 
the generalized eigenspaces of T. 


Proof. Exercise. i 


EXERCISES 


1. Label the following statements as true or false. 


(a) Eigenvectors of a linear operator T are also generalized eigenvec- 
tors of T. 

(b) It is possible for a generalized eigenvector of a linear operator T 
to correspond to a scalar that is not an eigenvalue of T. 

(c) Any linear operator on a finite-dimensional vector space has a Jor- 
dan canonical form. 

(d) A cycle of generalized eigenvectors is linearly independent. 

(e) There is exactly one cycle of generalized eigenvectors correspond- 
ing to each eigenvalue of a linear operator on a finite-dimensional 
vector space. 

(f) Let T be a linear operator on a finite-dimensional vector space 
whose characteristic polynomial splits, and let A,,A9,...,A, be 
the distinct eigenvalues of T. If, for each 7, 3; is a basis for Ky,, 
then ( U Gg U---U @, is a Jordan canonical basis for T. 

(g) For any Jordan block J, the operator Ly has Jordan canonical 
form J. 

(h) Let T be a linear operator on an n-dimensional vector space whose 
characteristic polynomial splits. Then, for any eigenvalue \ of T, 
Ky = N((T — Al)”). 
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2. For each matrix A, find a basis for each generalized eigenspace of Ly 
consisting of a union of disjoint cycles of generalized eigenvectors. Then 
find a Jordan canonical form J of A. 


or) oe) 


ee es ee 

(c) A=[21 -8 -11 (d) A= 
ay 00 3 0 
(a) i 8 


3. For each linear operator T, find a basis for each generalized eigenspace 
of T consisting of a union of disjoint cycles of generalized eigenvectors. 
Then find a Jordan canonical form J of T. 


(a) 
(b) 


(c) 


(d) 


T is the linear operator on P2(R) defined by T(f(x)) = 2f(a«) — 
f(x) 

V is the real vector space of functions spanned by the set of real 
valued functions {1, t,t?, e’, te’}, and T is the linear operator on V 
defined by T(f) = f’. 

T is the linear operator on M2x2(R) defined by T(A) = € i -A 
for all A € Moyo(R). 

T(A) = 2A + At for all A € Moyo(R). 


4.' Let T be a linear operator on a vector space V, and let y be a cycle 
of generalized eigenvectors that corresponds to the eigenvalue A. Prove 
that span(y) is a T-invariant subspace of V. 


5. Let 71,72,---,Yp be cycles of generalized eigenvectors of a linear op- 
erator T corresponding to an eigenvalue 4. Prove that if the initial 
eigenvectors are distinct, then the cycles are disjoint. 


6. Let T: V— W bea linear transformation. Prove the following results. 


(a) 
(b) 
(c) 


N(T) = N(—-T). 

N(T*) = N((—T)*). 

If V = W (so that T is a linear operator on V) and 4 is an eigen- 
value of T, then for any positive integer k 


N((T — Aly)*) = N((Aly — T)*). 


7. Let U bea linear operator on a finite-dimensional vector space V. Prove 
the following results. 


(a) 


N(U) € N(U2) C-»» C N(U*) C N(UFF) C--, 
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(b) If rank(U”) = rank(U™*') for some positive integer m, then 
rank(U™) = rank(U*) for any positive integer k > m. 


(c) If rank(U™) = rank(U™*!) for some positive integer m, then 
N(U™) = N(U*) for any positive integer k > m. 

(d) Let T be a linear operator on V, and let » be an eigenvalue of T. 
Prove that if rank((T—Al)™) = rank((T—Al)™**) for some integer 
m, then Ky = N((T — Al)”). 

(e) Second Test for Diagonalizability. Let T be a linear operator on 
V whose characteristic polynomial splits, and let A1,A2,... , Ax be 
the distinct eigenvalues of T. Then T is diagonalizable if and only 
if rank(T — Al) = rank((T — Al)?) for 1 <i<k. 

(f) Use (e) to obtain a simpler proof of Exercise 24 of Section 5.4: If 
T is a diagonalizable linear operator on a finite-dimensional vec- 
tor space V and W is a T-invariant subspace of V, then Tw is 
diagonalizable. 


Use Theorem 7.4 to prove that the vectors v1, v2,..., Uz in the statement 
of Theorem 7.3 are unique. 


Let T be a linear operator on a finite-dimensional vector space V whose 

characteristic polynomial splits. 

(a) Prove Theorem 7.5(b). 

(b) Suppose that @ is a Jordan canonical basis for T, and let > be an 
eigenvalue of T. Let 6’ = GN K,. Prove that (’ is a basis for Ky. 


Let T be a linear operator on a finite-dimensional vector space whose 

characteristic polynomial splits, and let \ be an eigenvalue of T. 

(a) Suppose that ¥ is a basis for K) consisting of the union of q disjoint 
cycles of generalized eigenvectors. Prove that q < dim(E)). 

(b) Let 6 be a Jordan canonical basis for T, and suppose that J = [T], 
has q Jordan blocks with 4 in the diagonal positions. Prove that 
q < dim(E)). 


Prove Corollary 2 to Theorem 7.7. 


Exercises 12 and 13 are concerned with direct sums of matrices, defined in 
Section 5.4 on page 320. 


12. 
13. 


Prove Theorem 7.8. 


Let T be a linear operator on a finite-dimensional vector space V such 
that the characteristic polynomial of T splits, and let 1, A2,...,Ax be 
the distinct eigenvalues of T. For each i, let J; be the Jordan canonical 
form of the restriction of T to Ky,. Prove that 

J=10d290:::-Odk 


is the Jordan canonical form of J. 
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7.2. THE JORDAN CANONICAL FORM II 


For the purposes of this section, we fix a linear operator T on an n-dimensional 
vector space V such that the characteristic polynomial of T splits. Let 
1, A2,---,Ax be the distinct eigenvalues of T. 


By Theorem 7.7 (p. 490), each generalized eigenspace K, contains an 
ordered basis (; consisting of a union of disjoint cycles of generalized eigen- 
vectors corresponding to A;. So by Theorems 7.4(b) (p. 487) and 7.5 (p. 489), 

k 
the union 3 = U (; is a Jordan canonical basis for T. For each i, let T; 
i=1 
be the restriction of T to Ky,, and let A; = [T,]g,. Then A; is the Jordan 
canonical form of T;, and 


A, O O 
O Ao O 
O O «+. A; 


is the Jordan canonical form of T. In this matrix, each O is a zero matrix of 
appropriate size. 


In this section, we compute the matrices A; and the bases (3;, thereby 
computing J and @ as well. While developing a method for finding J, it 
becomes evident that in some sense the matrices A; are unique. 


To aid in formulating the uniqueness theorem for J, we adopt the following 
convention: The basis @; for K), will henceforth be ordered in such a way 
that the cycles appear in order of decreasing length. That is, if G; is a disjoint 
union of cycles 71, Y2,---,Yn,; and if the length of the cycle 7; is p;, we index 
the cycles so that p; > pg > ++: > Pn,. This ordering of the cycles limits the 
possible orderings of vectors in @;, which in turn determines the matrix Aj. 
It is in this sense that A; is unique. It then follows that the Jordan canonical 
form for T is unique up to an ordering of the eigenvalues of T. As we will 
see, there is no uniqueness theorem for the bases (; or for 3. Specifically, we 
show that for each 7, the number n, of cycles that form (;, and the length p; 
(j =1,2,...,;) of each cycle, is completely determined by T. 


Example 1 


To illustrate the discussion above, suppose that, for some 7, the ordered basis 
G; for Ky, is the union of four cycles 6; = y1 U y2 U ¥3 U ya with respective 
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lengths p, = 3, po = 3, pg = 2, and py = 1. Then 
Ai 1 0/0 0 0 0 0 0 
0 A 1/0 0 0 0 0 0 
0 O0O »A|O0 0 0 0 0 OD 
0 0 O;A 1 O0;0 O 0 
A; = 0 0 0/0 A 1/0 O 0 ¢ 
0 0 0/0 0 A|0 O ODO 
0 0 0 0 0 0); +1) 0 
0 0 0 0 0 0)0 A} 0 
0 0 0 0 0 0 0 012; 


To help us visualize each of the matrices A; and ordered bases (;, we 
use an array of dots called a dot diagram of T;, where T; is the restriction 
of T to K,,. Suppose that (4; is a disjoint union of cycles of generalized 
eigenvectors Y1,72,---,Yn; With lengths pi > pe > --: > pn,;, respectively. 
The dot diagram of T; contains one dot for each vector in (;, and the dots 
are configured according to the following rules. 


1. The array consists of n; columns (one column for each cycle). 

2. Counting from left to right, the jth column consists of the p; dots that 
correspond to the vectors of y; starting with the initial vector at the 
top and continuing down to the end vector. 


Denote the end vectors of the cycles by v1, v2,...,Un,;. In the following 
dot diagram of T;, each dot is labeled with the name of the vector in (; to 
which it corresponds. 


©(T=APHvy) (T= APH) 0 (T= Aul)P™H(,) 
0 (T — Agl)P2-?2(vy) ee (T — Ail) P2772 (ug) ++ (T — Ail)? (On, ) 


(TAD) 
ara) ie 
e(T — A,N) (v1) ° 


ev, 


Notice that the dot diagram of T; has n; columns (one for each cycle) and 
pi rows. Since pi > po > ++: > pn;, the columns of the dot diagram become 
shorter (or at least not longer) as we move from left to right. 

Now let r; denote the number of dots in the jth row of the dot diagram. 
Observe that ry > rg > -+-- > Tp,- Furthermore, the diagram can be re- 
constructed from the values of the r;’s. The proofs of these facts, which are 
combinatorial in nature, are treated in Exercise 9. 
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In Example 1, with n; = 4, py = po = 3, p3 = 2, and py = 1, the dot 
diagram of T; is as follows: 


Here r; = 4, ro = 3, and r3 = 2. 

We now devise a method for computing the dot diagram of T; using the 
ranks of linear operators determined by T and \;. Hence the dot diagram 
is completely determined by T, from which it follows that it is unique. On 
the other hand, (4; is not unique. For example, see Exercise 8. (It is for this 
reason that we associate the dot diagram with T; rather than with (;.) 

To determine the dot diagram of T;, we devise a method for computing 
each r;, the number of dots in the jth row of the dot diagram, using only T 
and A;. The next three results give us the required method. To facilitate our 
arguments, we fix a basis 3; for Ky, so that (4; is a disjoint union of n,; cycles 
of generalized eigenvectors with lengths p1 > po >--- > pn;- 


Theorem 7.9. For any positive integer r, the vectors in 3; that are 
associated with the dots in the first r rows of the dot diagram of T; constitute 
a basis for N((T — A,l)"). Hence the number of dots in the first r rows of the 
dot diagram equals nullity((T — A,l)”). 


Proof. Clearly, N((T — Ail)”) C Ky,, and Ky, is invariant under (T — Aj)”. 
Let U denote the restriction of (T — A;l)" to Ky,. By the preceding remarks, 
N((T — A,I)") = N(U), and hence it suffices to establish the theorem for U. 
Now define 


S, ={x€ B;: U(r) =0} and Sy = {x € B;: U(x) F O}. 


Let a and b denote the number of vectors in S; and S$, respectively, and let 
m, = dim(K),). Then a+6=m,. For any x € (;, x € S; if and only if x is 
one of the first r vectors of a cycle, and this is true if and only if x corresponds 
to a dot in the first r rows of the dot diagram. Hence a is the number of dots 
in the first r rows of the dot diagram. For any x € Sb, the effect of applying 
U to x is to move the dot corresponding to x exactly r places up its column to 
another dot. It follows that U maps 5» in a one-to-one fashion into };. Thus 
{U(z): x € Sy} is a basis for R(U) consisting of b vectors. Hence rank(U) = 8, 
and so nullity(U) = m; — b = a. But Sj is a linearly independent subset of 
N(U) consisting of a vectors; therefore S; is a basis for N(U). 


In the case that r = 1, Theorem 7.9 yields the following corollary. 


Corollary. The dimension of E,, is n;. Hence in a Jordan canonical form 
of T, the number of Jordan blocks corresponding to A; equals the dimension 
of Ey, 3 
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We are now able to devise a method for describing the dot diagram in 
terms of the ranks of operators. 


Theorem 7.10. Let r; denote the number of dots in the jth row of the 
dot diagram of T;, the restriction of T to Ky,. Then the following statements 
are true. 


(a) ry = dim(V) — rank(T — )\I). 
(b) rj = rank((T — Ajl)?~1) —rank((T —A;I)%) if j > 1. 


Proof. By Theorem 7.9, for 1 < 7 < pi, we have 


ry tret-+--+r; = nullity((T — A\I)’) 
= dim(V) — rank((T — \,I)’). 


Hence 
= dim(V) — rank(T — ),l), 
and for j > 1, 
ry =(ritre+-:- +75) (ry re +-+++75-1) 
= [dim(V) — rank((T — A,1)?)] — [dim(V) — rank((T — ,1)2~1)] 
= rank((T — \,1)?~*) — rank((T — d,1)¥). | 


Theorem 7.10 shows that the dot diagram of T; is completely determined 
by T and \;. Hence we have proved the following result. 


Corollary. For any eigenvalue \; of T, the dot diagram of T; is unique. 
Thus, subject to the convention that the cycles of generalized eigenvectors 
for the bases of each generalized eigenspace are listed in order of decreasing 
length, the Jordan canonical form of a linear operator or a matrix is unique 
up to the ordering of the eigenvalues. 


We apply these results to find the Jordan canonical forms of two matrices 
and a linear operator. 


Example 2 
Let 


oo oO bl 
— 
— 
WwNnoor 
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We find the Jordan canonical form of A and a Jordan canonical basis for the 
linear operator T = Ly. The characteristic polynomial of A is 


det(A — tZ) = (t — 2)°(t — 3). 


Thus A has two distinct eigenvalues, 4; = 2 and Aq = 3, with multiplicities 3 
and 1, respectively. Let T, and T2 be the restrictions of L4 to the generalized 
eigenspaces Ky, and K),, respectively. 

Suppose that (3; is a Jordan canonical basis for T,;. Since A, has multi- 
plicity 3, it follows that dim(K),) = 3 by Theorem 7.4(c) (p. 487); hence the 
dot diagram of T; has three dots. As we did earlier, let 7; denote the number 
of dots in the jth row of this dot diagram. Then, by Theorem 7.10, 


0-1 O01 
0 1 -1 0 

ry =4—rank(A — 27) = 4 — rank 0 1-1 0 =4-2=2, 
0-1 01 


and 
rg = rank(A — 21) — rank((A — 27)) =2-—1=1. 


(Actually, the computation of rz is unnecessary in this case because r1 = 2 and 
the dot diagram only contains three dots.) Hence the dot diagram associated 
with By is 


e e 
e 
So 
2 1 0 
A, =[TiJo, = {90 2 O 
00 2 


Since Ap = 3 has multiplicity 1, it follows that dim(K),) = 1, and conse- 
quently any basis (2 for Ky, consists of a single eigenvector corresponding to 
Ag = 3. Therefore 


Az = [Tal]a, = (3). 


Setting G = 6, U Go, we have 


J = Lale = 


oCoCcN~w 
SoOoONrF 
oN C oO 
wnoTece 
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and so J is the Jordan canonical form of A. 


We now find a Jordan canonical basis for T = L4. We begin by determin- 
ing a Jordan canonical basis 3, for T,. Since the dot diagram of T, has two 
columns, each corresponding to a cycle of generalized eigenvectors, there are 
two such cycles. Let v; and v2 denote the end vectors of the first and second 
cycles, respectively. We reprint below the dot diagram with the dots labeled 
with the names of the vectors to which they correspond. 


e(T — 21)(v1) @v2 


ev, 


From this diagram we see that v,; € N((T — 21)”) but v, ¢ N(T — 21). Now 


Ov i. U0 4 Oo a2 A 
2100 At 0 ees Or 050 
A=2F= |). “5 =) | and AHP = a ow 
G1. ora ee eae 


It is easily seen that 


ooor 
OonNnNr oO 
NOro® 


is a basis for N((T — 2I)?) = K),. Of these three basis vectors, the last two 
do not belong to N(T — 21), and hence we select one of these for v1. Suppose 
that we choose 


0 

_ fl 

U= 2 

0 

Then 

0 -1 0 1 0 —1 
0 1 -1 0 1 —1 
0 -1 0 1 0 —1 


Now simply choose v2 to be a vector in E), that is linearly independent of 
(T — 21)(v1); for example, select 


V2 = 


ooor 
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Thus we have associated the Jordan canonical basis 


-1 0 1 
—1 1 0 
fen a —-1|’ 9}? 0 
—1 0 0 


with the dot diagram in the following manner. 


-1 1 
é -1 z 0 
-1 0 
-1 0 
0 
‘ 1 
2 
0 


By Theorem 7.6 (p. 489), the linear independence of (3; is guaranteed since 
vg was chosen to be linearly independent of (T — 2I)(v1). 

Since Az = 3 has multiplicity 1, dim(K,,) = dim(E,,) = 1. Hence any 
eigenvector of L4 corresponding to A» = 3 constitutes an appropriate basis 
(Bo. For example, 


1 

0 

f= 4] 0 

1 

Thus 

—l 0 1 1 
—1 1 0 0 
B = feat U Bo = -1]? 9]? Oo]? 0 
—l 0 0 1 


is a Jordan canonical basis for Ly. 
Notice that if 


-1 011 

-1 1 0 0 
= -1 2 0 OF’ 

-1 00 1 
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Example 3 
Let 
2 -4 2 2 
—2 0 1 3 
ied ee ee 
—2 -6 3 7 


We find the Jordan canonical form J of A, a Jordan canonical basis for Ly, 
and a matrix Q such that J = Q-'AQ. 


The characteristic polynomial of A is det(A — tI) = (t — 2)?(t — 4)?. Let 
T = La, Ai = 2, and Ag = 4, and let T; be the restriction of L4 to Ky, for 
aay 


We begin by computing the dot diagram of T;. Let 71 denote the number 
of dots in the first row of this diagram. Then 


ry =4—rank(A—- 27) =4-2=2; 
hence the dot diagram of Tj is as follows. 
e e 


Therefore 


A, = (Tila, = (; ) ; 


where 3; is any basis corresponding to the dots. In this case, 3; is an arbitrary 
basis for Ey, = N(T — 2l), for example, 


@ 
iS 
| 
wmWorN 
ONHO 


Next we compute the dot diagram of Tz. Since rank(A — 4/) = 3, there 
is only 4— 3 = 1 dot in the first row of the diagram. Since Ag = 4 has 
multiplicity 2, we have dim(K,) = 2, and hence this dot diagram has the 
following form: 


Thus 


Ag = [Tala = ¢ i) ’ 
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where (2 is any basis for Ky), corresponding to the dots. In this case, 3, 
is a cycle of length 2. The end vector of this cycle is a vector v € Ky, = 
N((T — 4l)?) such that v ¢ N(T — 4l). One way of finding such a vector was 
used to select the vector v; in Example 2. In this example, we illustrate 
another method. A simple calculation shows that a basis for the null space 
of L4 — 4l is 


rPrRrH © 


Choose v to be any solution to the system of linear equations 


0 
1 
(A—4I)x = il? 
1 
for example, 
1 
eee) (at 
Sel 
0 
Thus 
0 1 
1 1 
Bo = {(La—ANo),2}=4 1515 | 21 
1 0 
Therefore 
2 0 0 1 
1 1 1 —1 
B FB feat U Bo 7 oO]? 9]? 1]? =: 
2 0 1 0 


is a Jordan canonical basis for L4. The corresponding Jordan canonical form 
is given by 
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Finally, we define Q to be the matrix whose columns are the vectors of 3 
listed in the same order, namely, 


200 1 
(ae ee, ee 
Q=1o 21 -1 
Ph AO 


Then J=Q-'AQ. 


Example 4 


Let V be the vector space of polynomial functions in two real variables x 
and y of degree at most 2. Then V is a vector space over R and a = 
{1,2,y,x7,y?, xy} is an ordered basis for V. Let T be the linear operator 
on V defined by 


TF.) = 2 Fle.) 


For example, if f(x,y) = 2 + 2x7 — 3ry + y, then 


0 
T(f(z,y)) = Ag | Qn? — 3ry + y) = 14 4a — 3y. 


We find the Jordan canonical form and a Jordan canonical basis for T. 
Let A= [T],. Then 


0 10 0 0 0 
000 2 0 0 
000 00 1 
ae 000 0 0 OF’ 
000 0 0 0 
000 0 0 0 
and hence the characteristic polynomial of T is 
—t 1 0 0 0 0 
0 -t O 2 0 0O 
2 0 O -t O O 1] _4¢ 
det(A — tI) = det 0 0 0 -t 0 0 tines 
0 0 O 0 -t O 
0 0 O OO O -t 


Thus A = 0 is the only eigenvalue of T, and K, = V. For each j, let r; denote 
the number of dots in the jth row of the dot diagram of T. By Theorem 7.10, 


ry =6—rank(A) =6-3=3, 
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and since 


A= 


ooCcCoCc oO 
ooCocooco 
ooCcooco 
ooCccocoon~w 
ooCcococo 
OOO. OOS 


rg = rank(A) — rank(A?) = 3-1 =2. 


Because there are a total of six dots in the dot diagram and r; = 3 and 
rg = 2, it follows that r3 = 1. So the dot diagram of T is 


We conclude that the Jordan canonical form of T is 


YQ 
II 
cooccoceo 


0 
0 
0 
0 
0 
| 07 


SO OS OrOr = 


We now find a Jordan canonical basis for T. Since the first column of the 


dot diagram of T consists of three dots, we must find a polynomial f;(z, y) 
2 


0 
such that att (x,y) # 0. Examining the basis a = {1,2z,y,2?,y?, ry} for 
x 
K, =V, we see that x? is a suitable candidate. Setting fi (x,y) = x7, we see 
that 


0 


(T—AN(Sile,9)) = TUile,y)) = 2 (a2) = 2x 
and 
(TAN? (Aal@y)) = Tia) = Sala”) = 2. 


Likewise, since the second column of the dot diagram consists of two dots, we 
must find a polynomial f(a, y) such that 


e2 


ex? 


(flay) #0, but So (flay) =0. 
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Since our choice must be linearly independent of the polynomials already 
chosen for the first cycle, the only choice in a that satisfies these constraints 
is zy. So we set fo(x,y) = xy. Thus 


(T —AN(fol2,9)) = Tsol2,9)) = 2 au) = v. 


Finally, the third column of the dot diagram consists of a single polynomial 
that lies in the null space of T. The only remaining polynomial in a is y?, 
and it is suitable here. So set f3(x,y) = y?. Therefore we have identified 
polynomials with the dots in the dot diagram as follows. 


02 ey ey 
e2x exy 


ex 


Thus 3 = {2,22,x7,y, ry, y?} is a Jordan canonical basis for T. 


In the three preceding examples, we relied on our ingenuity and the con- 
text of the problem to find Jordan canonical bases. The reader can do the 
same in the exercises. We are successful in these cases because the dimen- 
sions of the generalized eigenspaces under consideration are small. We do 
not attempt, however, to develop a general algorithm for computing Jordan 
canonical bases, although one could be devised by following the steps in the 
proof of the existence of such a basis (Theorem 7.7 p. 490). 

The following result may be thought of as a corollary to Theorem 7.10. 


Theorem 7.11. Let A and B be n x n matrices, each having Jordan 
canonical forms computed according to the conventions of this section. Then 
A and B are similar if and only if they have (up to an ordering of their 
eigenvalues) the same Jordan canonical form. 


Proof. If A and B have the same Jordan canonical form J, then A and B 
are each similar to J and hence are similar to each other. 

Conversely, suppose that A and B are similar. Then A and B have the 
same eigenvalues. Let J4 and Jg denote the Jordan canonical forms of A and 
B, respectively, with the same ordering of their eigenvalues. Then A is similar 
to both J, and Jz, and therefore, by the corollary to Theorem 2.23 (p. 115), 
Ja and Jg are matrix representations of L4. Hence J4 and Jg are Jordan 
canonical forms of Ly. Thus J4 = Jg by the corollary to Theorem 7.10. ff 


Example 5 
We determine which of the matrices 
—3 3-2 0 1 -l 
A= |-7 6 -3], B= |-4 4 -2 


1 -1 2 —2 1 1 
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is ere OV i 2 
G=|-3 -1 -2), and D=/0 1 1 
7 5 6 00 2 


are similar. Observe that A, B, and C have the same characteristic poly- 
nomial —(t — 1)(t — 2)?, whereas D has —t(t — 1)(t — 2) as its characteristic 
polynomial. Because similar matrices have the same characteristic polynomi- 
als, D cannot be similar to A, B, or C. Let Ja, Jpg, and Jo be the Jordan 
canonical forms of A, B, and C, respectively, using the ordering 1, 2 for their 
common eigenvalues. Then (see Exercise 4) 


an a>) 
 ) 
ono 
NOOO 


1 
5 and Jo = |0 
0 


on oO 
en) 


1 1 
J, = | 0 Pee fi i) 

0 0 
Since J4 = Jc, A is similar to C. Since Jg is different from J, and Jc, B is 
similar to neither AnorC. 


The reader should observe that any diagonal matrix is a Jordan canonical 
form. Thus a linear operator T on a finite-dimensional vector space V is diag- 
onalizable if and only if its Jordan canonical form is a diagonal matrix. Hence 
T is diagonalizable if and only if the Jordan canonical basis for T consists of 
eigenvectors of T. Similar statements can be made about matrices. Thus, 
of the matrices A, B, and C in Example 5, A and C are not diagonalizable 
because their Jordan canonical forms are not diagonal matrices. 


EXERCISES 


1. Label the following statements as true or false. Assume that the char- 
acteristic polynomial of the matrix or linear operator splits. 


(a) The Jordan canonical form of a diagonal matrix is the matrix itself. 

(b) Let T be a linear operator on a finite-dimensional vector space V 
that has a Jordan canonical form J. If 6 is any basis for V, then 
the Jordan canonical form of [T]g is J. 

(c) Linear operators having the same characteristic polynomial are 
similar. 

(d) Matrices having the same Jordan canonical form are similar. 

(e) Every matrix is similar to its Jordan canonical form. 

(f) Every linear operator with the characteristic polynomial 
(—1)"(t — A)” has the same Jordan canonical form. 

(g) Every linear operator on a finite-dimensional vector space has a 
unique Jordan canonical basis. 

(h) The dot diagrams of a linear operator on a finite-dimensional vec- 
tor space are unique. 
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Let T be a linear operator on a finite-dimensional vector space V such 
that the characteristic polynomial of T splits. Suppose that A; = 2, 
A2 = 4, and A3 = —3 are the distinct eigenvalues of T and that the dot 
diagrams for the restriction of T to Ky, (¢ = 1,2,3) are as follows: 


Ay = 2 A2 = 4 A3 = —3 
e e e e e e e 
e e e 
e e 


Find the Jordan canonical form J of T. 


Let T be a linear operator on a finite-dimensional vector space V with 
Jordan canonical form 


oocooqoo wy 
oOCOCOOOoOnrF 


0 
0 
0 
0 
0 
13 07 
3 


0 0 
0 0 


(a) Find the characteristic polynomial of T. 
(b) Find the dot diagram corresponding to each eigenvalue of T. 
(c) For which eigenvalues \,, if any, does Ey, = Ky,? 
(d) For each eigenvalue ;, find the smallest positive integer p; for 
which Ky, = N((T — A,l)?*). 
(e) Compute the following numbers for each i, where U; denotes the 
restriction of T — A;I to Ky,. 
(i) rank(U;) 
(ii) rank(U?) 
(iii) nullity (U;) 
(iv) nullity(U?) 
For each of the matrices A that follow, find a Jordan canonical form 


J and an invertible matrix Q such that J = Q7!AQ. Notice that the 
matrices in (a), (b), and (c) are those used in Example 5. 


-3 3 -2 01-1 
(a) A=|-7 6 -3 (b) A=|-4 4 -2 
1-1 2 29 rie 4a 
0 -1 -1 3 Se ae : 
(c) A=[-3 -1 -2 (d) A= Bae a eae 
7 5 6 
2 4 
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5. For each linear operator T, find a Jordan canonical form J of T and a 
Jordan canonical basis @ for T. 


(a) V is the real vector space of functions spanned by the set of real- 
valued functions {e', te’, t?e', e7"}, and T is the linear operator on 
V defined by T(f) = f’. 

(b) T is the linear operator on P3(R) defined by T(f(x)) = af’ (x). 

(c) T is the linear operator on P3(R) defined by 
T(f(a)) = f"(x) + 2f(2). 

(d) T is the linear operator on M2x2(R) defined by 


(e) T is the linear operator on M2x2(R) defined by 


T(A) = é ) (A At), 


(f) V is the vector space of polynomial functions in two real variables 
x and y of degree at most 2, as defined in Example 4, and T is the 
linear operator on V defined by 


Tew) = goles) + 5 fley) 


6. Let A be an n x n matrix whose characteristic polynomial splits. Prove 
that A and A’ have the same Jordan canonical form, and conclude that 
A and A’ are similar. Hint: For any eigenvalue \ of A and A‘ and any 
positive integer r, show that rank((A — Al)") = rank((A‘ — Al)"). 


7. Let A be an n x n matrix whose characteristic polynomial splits, y be 
a cycle of generalized eigenvectors corresponding to an eigenvalue \, 
and W be the subspace spanned by y. Define y’ to be the ordered set 
obtained from y by reversing the order of the vectors in ¥. 


(a) Prove that [Tw], = ([Tw],)’- 

(b) Let J be the Jordan canonical form of A. Use (a) to prove that J 
and J‘ are similar. 

(c) Use (b) to prove that A and A? are similar. 


8. Let T be a linear operator on a finite-dimensional vector space, and 
suppose that the characteristic polynomial of T splits. Let @ be a Jordan 
canonical basis for T. 


(a) Prove that for any nonzero scalar c, {cx: x € 3} isa Jordan canon- 
ical basis for T. 
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(b) Suppose that ¥ is one of the cycles of generalized eigenvectors that 
forms @, and suppose that y corresponds to the eigenvalue \ and 
has length greater than 1. Let x be the end vector of 7, and let y 
be a nonzero vector in Ey. Let ¥’ be the ordered set obtained from 
y by replacing « by x + y. Prove that 7 is a cycle of generalized 
eigenvectors corresponding to A, and that if y’ replaces y in the 
union that defines @, then the new union is also a Jordan canonical 
basis for T. 

(c) Apply (b) to obtain a Jordan canonical basis for L4, where A is the 
matrix given in Example 2, that is different from the basis given 
in the example. 


Suppose that a dot diagram has k columns and m rows with p; dots in 
column j and r; dots in row 7. Prove the following results. 


(a) m=p,andk=r. 

(b) p; = max{i:r; > j} for 1 <j <k andr; = max{j: p; > i} for 
1<i<m. Hint: Use mathematical induction on m. 

(CS) 2 e259 Stig 

(d) Deduce that the number of dots in each column of a dot diagram 
is completely determined by the number of dots in the rows. 


Let T be a linear operator whose characteristic polynomial splits, and 

let A be an eigenvalue of T. 

(a) Prove that dim(K,) is the sum of the lengths of all the blocks 
corresponding to » in the Jordan canonical form of T. 

(b) Deduce that E, = Ky if and only if all the Jordan blocks corre- 
sponding to » are 1 x 1 matrices. 


The following definitions are used in Exercises 11-19. 


Definitions. A linear operator T on a vector space V is called nilpotent 


if T? = To for some positive integer p. Ann xn matrix A is called nilpotent 
if AP = O for some positive integer p. 


11. 


12. 


13. 


Let T be a linear operator on a finite-dimensional vector space V, and 
let @ be an ordered basis for V. Prove that T is nilpotent if and only if 
[T]g is nilpotent. 


Prove that any square upper triangular matrix with each diagonal entry 
equal to zero is nilpotent. 


Let T be a nilpotent operator on an n-dimensional vector space V, and 
suppose that p is the smallest positive integer for which T? = To. Prove 
the following results. 


(a) N(T’) C N(T***) for every positive integer i. 
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14. 


15. 


16. 


17. 


18. 


(b) There is a sequence of ordered bases (3), 32,..., 3, such that (3; is 
a basis for N(T*) and 8:41 contains 3; for 1 <i<p—1. 

(c) Let 6 = G, be the ordered basis for N(T?) = V in (b). Then [T]g 
is an upper triangular matrix with each diagonal entry equal to 
ZerO. 

(d) The characteristic polynomial of T is (—1)"¢”. Hence the charac- 
teristic polynomial of T splits, and 0 is the only eigenvalue of T. 


Prove the converse of Exercise 13(d): If T is a linear operator on an n- 
dimensional vector space V and (—1)"t” is the characteristic polynomial 
of T, then T is nilpotent. 


Give an example of a linear operator T on a finite-dimensional vector 
space such that T is not nilpotent, but zero is the only eigenvalue of T. 
Characterize all such operators. 


Let T be a nilpotent linear operator on a finite-dimensional vector space 
V. Recall from Exercise 13 that 4 = 0 is the only eigenvalue of T, and 
hence V = Ky. Let @ be a Jordan canonical basis for T. Prove that for 
any positive integer 7, if we delete from ( the vectors corresponding to 
the last 7 dots in each column of a dot diagram of (3, the resulting set is 
a basis for R(T’). (If a column of the dot diagram contains fewer than i 
dots, all the vectors associated with that column are removed from (.) 


Let T be a linear operator on a finite-dimensional vector space V such 
that the characteristic polynomial of T splits, and let 1, A2,...,Ax be 
the distinct eigenvalues of T. Let S: V — V be the mapping defined by 


S(x) = dyv, + Agve +--+ + AKU, 


where, for each 7, v; is the unique vector in Ky, such that 7 = v1 + 
vg+-+-+uz. (This unique representation is guaranteed by Theorem 7.3 
(p. 486) and Exercise 8 of Section 7.1.) 


(a) Prove that S is a diagonalizable linear operator on V. 
(b) Let U=T-—S. Prove that U is nilpotent and commutes with S, 
that is, SU = US. 


Let T be a linear operator on a finite-dimensional vector space V, and 
let J be the Jordan canonical form of T. Let D be the diagonal matrix 
whose diagonal entries are the diagonal entries of J, and let M = J—D. 
Prove the following results. 


(a) M is nilpotent. 
(b) MD=DM. 
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(c) If pis the smallest positive integer for which M? = O, then, for 
any positive integer r < p, 
rope r—1 r(r = 1) r—2y72 r—1 r 


and, for any positive integer r > p, 


24 
J? = D’+rD"™ M+ ee fue. 
rl 
DrPtt yet, 
Cape lhipat)! 
19. Let 

TO 0 
Oe Oe, Gl! ake 10 
Ur 2x 0 

J= 
0 0 0 1 
0 0 0 ps 


be the m x m Jordan block corresponding to A, and let N = J — XI. 
Prove the following results: 


(a) N™ =O, and for l<r<m, 


Nee 1 oe ee ae 
J 0 otherwise. 


(b) For any integer r > m, 


r r-1 r(r 7 1) r-2 0. r(r _ 1) aod: (r Tmt 2) r—m+1 
Aree a (m—1)! A 
r rol 2. res Leek 3) r—m+2 
yr= 0 A rr C= r 


(c) lim J” exists if and only if one of the following holds: 
(i) |A| < 1. 
(ii) A\=landm=l1. 
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(Note that lim A” exists under these conditions. See the discus- 
T— co 
sion preceding Theorem 5.13 on page 285.) Furthermore, lim J” 
T— co 


is the zero matrix if condition (i) holds and is the 1 x 1 matrix (1) 
if condition (ii) holds. 
(d) Prove Theorem 5.13 on page 285. 


The following definition is used in Exercises 20 and 21. 


20. 


21. 


Definition. For any A € Myxn(C), define the norm of A by 


|| Al] = max {|Aij]: 1 < 4,9 < nf. 


Let A, BE Mnxn(C). Prove the following results. 
(a) ||A|] > 0 and ||Al| = 0 if and only if A= O. 
(b)  ||cA|| = |c|-||A]] for any scalar c. 

(c) ||A+ Bl < |All + ||BI. 

(d) ||AB|| < nlAll|| BI 


Let A € Mnxn(C) be a transition matrix. (See Section 5.3.) Since C' is 
an algebraically closed field, A has a Jordan canonical form J to which 
A is similar. Let P be an invertible matrix such that P~1AP = J. 
Prove the following results. 


(a) ||A™|| <1 for every positive integer m. 

(b) There exists a positive number c such that ||J”|| < c for every 
positive integer m. 

(c) Each Jordan block of J corresponding to the eigenvalue \ = 1 is a 
1 x 1 matrix. 

(d) lim A™ exists if and only if 1 is the only eigenvalue of A with 


absolute value 1. 
(e) Theorem 5.20(a) using (c) and Theorem 5.19. 


The next exercise requires knowledge of absolutely convergent series as well 
as the definition of e4 for a matrix A. (See page 312.) 


22. 


23. 


Use Exercise 20(d) to prove that e4 exists for every A € Mnxn(C). 


Let x’ = Ax be a system of n linear differential equations, where x is 
an n-tuple of differentiable functions 21(t), 9(t),...,2n(t) of the real 
variable t, and A is an n x n coefficient matrix as in Exercise 15 of 
Section 5.2. In contrast to that exercise, however, do not assume that 
A is diagonalizable, but assume that the characteristic polynomial of A 
splits. Let Ay, A2,.--, Ax be the distinct eigenvalues of A. 
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(a) Prove that if u is the end vector of a cycle of generalized eigenvec- 
tors of L4 of length p and u corresponds to the eigenvalue .,;, then 
for any polynomial f(t) of degree less than p, the function 


e™"[f(t)(A — Agl)P1 + f'(Q)(A— A? ? $+ FOV (Ju 


is a solution to the system 2’ = Az. 

(b) Prove that the general solution to 2’ = Az is a sum of the functions 
of the form given in (a), where the vectors u are the end vectors of 
the distinct cycles that constitute a fixed Jordan canonical basis 
for La. 


24. Use Exercise 23 to find the general solution to each of the following sys- 
tems of linear equations, where x, y, and z are real-valued differentiable 
functions of the real variable t. 


gi=2Qe+ y w=2e+ y 
(a) y= Qy—- z (b) y= Qy+ z 
gx 3z a 2z 


7.35 THE MINIMAL POLYNOMIAL 


The Cayley-Hamilton theorem (Theorem 5.23 p. 317) tells us that for any 
linear operator T on an n-dimensional vector space, there is a polynomial 
f(t) of degree n such that f(T) = To, namely, the characteristic polynomial 
of T. Hence there is a polynomial of least degree with this property, and this 
degree is at most n. If g(t) is such a polynomial, we can divide g(t) by its 
leading coefficient to obtain another polynomial p(t) of the same degree with 
leading coefficient 1, that is, p(t) is a monic polynomial. (See Appendix E.) 


Definition. Let T be a linear operator on a finite-dimensional vector 
space. A polynomial p(t) is called a minimal polynomial of T if p(t) is a 
monic polynomial of least positive degree for which p(T) = To. 


The preceding discussion shows that every linear operator on a finite- 
dimensional vector space has a minimal polynomial. The next result shows 
that it is unique. 


Theorem 7.12. Let p(t) be a minimal polynomial of a linear operator T 
on a finite-dimensional vector space V. 
(a) For any polynomial g(t), if g(T) = To, then p(t) divides g(t). In partic- 
ular, p(t) divides the characteristic polynomial of T. 
(b) The minimal polynomial of T is unique. 


Proof. (a) Let g(t) be a polynomial for which g(T) = To. By the division 
algorithm for polynomials (Theorem E.1 of Appendix E, p. 562), there exist 
polynomials q(t) and r(t) such that 


g(t) = a(t)p(t) + r(t), (1) 
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where r(t) has degree less than the degree of p(t). Substituting T into (1) 
and using that g(T) = p(T) = To, we have r(T) = To. Since r(t) has degree 
less than p(t) and p(t) is the minimal polynomial of T, r(¢) must be the zero 
polynomial. Thus (1) simplifies to g(t) = q(t)p(t), proving (a). 

(b) Suppose that p;(¢) and p2(t) are each minimal polynomials of T. Then 
pi(t) divides po(t) by (a). Since p;(t) and po(t) have the same degree, we have 
that p(t) = cp;(t) for some nonzero scalar c. Because p;(t) and po(t) are 
monic, c = 1; hence p(t) = pa(t). 


The minimal polynomial of a linear operator has an obvious analog for a 
matrix. 


Definition. Let A € Mnxn(F). The minimal polynomial p(t) of A is 
the monic polynomial of least positive degree for which p(A) = O. 


The following results are now immediate. 


Theorem 7.13. Let T be a linear operator on a finite-dimensional vector 
space V, and let (3 be an ordered basis for V. Then the minimal polynomial 
of T is the same as the minimal polynomial of [T],. 


Proof. Exercise. | 


Corollary. For any A € Myxn(F), the minimal polynomial of A is the 
same as the minimal polynomial of L 4. 


Proof. Exercise. | 


In view of the preceding theorem and corollary, Theorem 7.12 and all 
subsequent theorems in this section that are stated for operators are also 
valid for matrices. 

For the remainder of this section, we study primarily minimal polynomials 
of operators (and hence matrices) whose characteristic polynomials split. A 
more general treatment of minimal polynomials is given in Section 7.4. 


Theorem 7.14. Let T be a linear operator on a finite-dimensional vector 
space V, and let p(t) be the minimal polynomial of T. A scalar X is an 
eigenvalue of T if and only if p(A) = 0. Hence the characteristic polynomial 
and the minimal polynomial of T have the same zeros. 


Proof. Let f(t) be the characteristic polynomial of T. Since p(t) divides 
f(t), there exists a polynomial q(t) such that f(t) = q(t)p(t). If A is a zero of 
p(t), then 


f(A) = @(A)p(A) = g(A)-0 = 0. 


So A is a zero of f(t); that is, \ is an eigenvalue of T. 
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Conversely, suppose that » is an eigenvalue of T, and let x € V be an 
eigenvector corresponding to A. By Exercise 22 of Section 5.1, we have 


0 = To(a) = p(T) (2) = p()e. 
Since x # 0, it follows that p(A) = 0, and so J is a zero of p(t). | 
The following corollary is immediate. 


Corollary. Let T be a linear operator on a finite-dimensional vector space 
V with minimal polynomial p(t) and characteristic polynomial f(t). Suppose 
that f(t) factors as 


FE) = Ar — "As — 0)" An 8)", 


where 1, 2,.-.-,Ax are the distinct eigenvalues of T. Then there exist inte- 
gers m1,™M2,...,Mx such that 1 < m,; <n, for alli and 


p(t) = (6 — Ar) (E— Ag)» (E— An). 


Example 1 
We compute the minimal polynomial of the matrix 
3 -l1 0 
A={0 2 0 
1 -1 2 


Since A has the characteristic polynomial 


8-t -1 0 
ft=det| 0 2-t 0 | =—(¢-2)?¢-3), 


the minimal polynomial of A must be either (t — 2)(t — 3) or (t — 2)?(t — 3) 
by the corollary to Theorem 7.14. Substituting A into p(t) = (t — 2)(¢ — 3), 
we find that p(A) = O; hence p(t) is the minimal polynomial of A. 


Example 2 
Let T be the linear operator on R? defined by 

T(a, b) = (2a + 5b, 6a + b) 
and be the standard ordered basis for R?. Then 


[Te = i ') 5 


and hence the characteristic polynomial of T is 


f(t) = det A. “2 = (t—7)(t+4). 


Thus the minimal polynomial of T is also (t{—7)(t+4). 
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Example 3 


Let D be the linear operator on P2(R) defined by D(g(x)) = g’(x), the deriva- 
tive of g(a). We compute the minimal polynomial of T. Let 3 be the standard 
ordered basis for P2(R). Then 


0 1 0 
[Djs=[o 0 2), 
0 0 0 


and it follows that the characteristic polynomial of D is —t?. So by the 
corollary to Theorem 7.14, the minimal polynomial of D is t, t?, or t°. Since 
D?(x?) = 2 4 0, it follows that D? 4 Ty; hence the minimal polynomial of D 
must be t?. 


In Example 3, it is easily verified that P2(R) is a D-cyclic subspace (of 
itself). Here the minimal and characteristic polynomials are of the same 
degree. This is no coincidence. 


Theorem 7.15. Let T be a linear operator on an n-dimensional vector 
space V such that V is a T-cyclic subspace of itself. Then the characteristic 
polynomial f(t) and the minimal polynomial p(t) have the same degree, and 
hence f(t) = (—1)"p(t). 


Proof. Since V is a T-cyclic space, there exists an x € V such that 
B= {z,T(z),...,T°"(x)} 
is a basis for V (Theorem 5.22 p. 315). Let 


g(t) =an + ayt+--:+ a,t*, 
be a polynomial of degree k < n. Then a, #0 and 
g(T)(a) = aow + a T(x) + +++ + aT (a), 


and so g(T)(x) is a linear combination of the vectors of 3 having at least one 
nonzero coefficient, namely, az. Since ( is linearly independent, it follows 
that g(T)(x) 4 0; hence g(T) # To. Therefore the minimal polynomial of T 
has degree n, which is also the degree of the characteristic polynomial of T. 


Theorem 7.15 gives a condition under which the degree of the minimal 
polynomial of an operator is as large as possible. We now investigate the 
other extreme. By Theorem 7.14, the degree of the minimal polynomial of an 
operator must be greater than or equal to the number of distinct eigenvalues 
of the operator. The next result shows that the operators for which the 
degree of the minimal polynomial is as small as possible are precisely the 
diagonalizable operators. 
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Theorem 7.16. Let T be a linear operator on a finite-dimensional vector 
space V. Then T is diagonalizable if and only if the minimal polynomial of T 
is of the form 


p(t) = (t — Ar)(t— Az) +++ (tf — Ak), 
where 1, A2,.-.-,Ax are the distinct eigenvalues of T. 


Proof. Suppose that T is diagonalizable. Let 1, A2,..., Ax be the distinct 
eigenvalues of T, and define 


p(t) = (t— Ar)(t — Az) +++ (E= Ag): 


By Theorem 7.14, p(t) divides the minimal polynomial of T. Let 6 = 
{v1,v2,...,Un} be a basis for V consisting of eigenvectors of T, and con- 
sider any uv; € 3. Then (T —A,1)(v;) = 0 for some eigenvalue A;. Since t— A; 
divides p(t), there is a polynomial q,(t) such that p(t) = q;(t)(t— A;). Hence 


PT) (vi) = ag (T)(T — Ag (vi) = 0. 


It follows that p(T) = To, since p(T) takes each vector in a basis for V into 
0. Therefore p(t) is the minimal polynomial of T. 

Conversely, suppose that there are distinct scalars \1, A2,..., Ax such that 
the minimal polynomial p(t) of T factors as 


p(t) = (f— Ar)(t — Az) +++ (E— Ag). 


By Theorem 7.14, the \;’s are eigenvalues of T. We apply mathematical 
induction on n = dim(V). Clearly T is diagonalizable for n = 1. Now 
assume that T is diagonalizable whenever dim(V) < n for some n > 1, and 
let dim(V) = n and W = R(T — Ag). Obviously W # V, because 2; is an 
eigenvalue of T. If W = {0}, then T = A;l, which is clearly diagonalizable. 
So suppose that 0 < dim(W) < n. Then W is T-invariant, and for any x € W, 


(T — Ag (T = dol) «++ (T = Ag—1l)(a) = 0. 


It follows that the minimal polynomial of Tw divides the polynomial 
(t — Ax)(t — Az) +++ (E— Axi). Hence by the induction hypothesis, Tw is 
diagonalizable. Furthermore, A; is not an eigenvalue of Tw by Theorem 7.14. 
Therefore WM N(T — Axl) = {0}. Now let 61 = {v1,v2,...,Um} be a ba 
sis for W consisting of eigenvectors of Tw (and hence of T), and let G2 = 
{W1,W2,-.. ,Wp} be a basis for N(T — A, I), the eigenspace of T corresponding 
to Ax. Then 6; and (2 are disjoint by the previous comment. Moreover, 
m +p =n by the dimension theorem applied to T — Axl. We show that 
GB = G1 U Bo is linearly independent. Consider scalars aj, a2,... ,@m and 
by, b2,... , bp such that 


a1V1 + agdv2 ++++ +AmUm + b1w1 + bgW2 +--+ + bpwy = 0. 
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Let 


m Pp 
= y Aj,V; and y= y b;w;.- 
i=l i=1 


Then « € W, y € N(T — Agl), and «+ y = O. It follows that = —y € 
WO N(T — Agl), and therefore = 0. Since (3; is linearly independent, we 
have that a, ag oo Am 0. Similarly, 6; ba ree bp 0, 
and we conclude that ( is a linearly independent subset of V consisting of n 
eigenvectors. It follows that 3 is a basis for V consisting of eigenvectors of T, 
and consequently T is diagonalizable. | 


In addition to diagonalizable operators, there are methods for determin- 
ing the minimal polynomial of any linear operator on a finite-dimensional 
vector space. In the case that the characteristic polynomial of the operator 
splits, the minimal polynomial can be described using the Jordan canonical 
form of the operator. (See Exercise 13.) In the case that the characteristic 
polynomial does not split, the minimal polynomial can be described using the 
rational canonical form, which we study in the next section. (See Exercise 7 
of Section 7.4.) 


Example 4 


We determine all matrices A € Mox2(R) for which A? — 3A +2I =O. Let 
g(t) = t? — 3t + 2 = (t—1)(t — 2). Since g(A) = O, the minimal polynomial 
p(t) of A divides g(t). Hence the only possible candidates for p(t) are t — 1, 
t—2, and (¢—1)(t—2). If p(t) =t—1 or p(t) =t—2, then A= J or A= 2], 
respectively. If p(t) = (t—1)(t—2), then A is diagonalizable with eigenvalues 
1 and 2, and hence A is similar to 


1 0 
¢ a + 
Example 5 


Let A € Mnyxn(R) satisfy A? = A. We show that A is diagonalizable. Let 
g(t) = t®? —t = t(t+ 1)(t-1). Then g(A) = O, and hence the minimal 
polynomial p(t) of A divides g(t). Since g(t) has no repeated factors, neither 
does p(t). Thus A is diagonalizable by Theorem 7.16. 


Example 6 


In Example 3, we saw that the minimal polynomial of the differential operator 
D on P2(R) is t?. Hence, by Theorem 7.16, D is not diagonalizable. 
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EXERCISES 


Label the following statements as true or false. Assume that all vector 
spaces are finite-dimensional. 


(a) Every linear operator T has a polynomial p(t) of largest degree for 
which p(T) = To. 

(b) Every linear operator has a unique minimal polynomial. 

(c) The characteristic polynomial of a linear operator divides the min- 
imal polynomial of that operator. 

(d) The minimal and the characteristic polynomials of any diagonal- 
izable operator are equal. 

(e) Let T bea linear operator on an n-dimensional vector space V, p(t) 
be the minimal polynomial of T, and f(t) be the characteristic 
polynomial of T. Suppose that f(¢) splits. Then f(t) divides 
ip(t)]”. 

(f) The minimal polynomial of a linear operator always has the same 
degree as the characteristic polynomial of the operator. 

(g) A linear operator is diagonalizable if its minimal polynomial splits. 

(h) Let T be a linear operator on a vector space V such that V is a 
T-cyclic subspace of itself. Then the degree of the minimal poly- 
nomial of T equals dim(V). 

(i) Let T be a linear operator on a vector space V such that T has n 
distinct eigenvalues, where n = dim(V). Then the degree of the 
minimal polynomial of T equals n. 


Find the minimal polynomial of each of the following matrices. 


4 -14 5 3 0 1 
(c) {1 -4 2 (d){ 2 2 2 
1 -6 4 =) 0: 1 


For each linear operator T on V, find the minimal polynomial of T. 
(a) V=R? and T(a,b) = (a+ b,a—b) 

(b) V = P2(R) and T(g(x)) = g/(x) + 2g(2) 

(c) V=P2(R) and T(f(x)) = —af" (x) + f'(x) + 2f(x) 

(d) V=Mnpyxn(R) and T(A) = A‘. Hint: Note that T? =I. 


Determine which of the matrices and operators in Exercises 2 and 3 are 
diagonalizable. 


Describe all linear operators T on R? such that T is diagonalizable and 
T?—2T7+T=To. 
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6. Prove Theorem 7.13 and its corollary. 
7. Prove the corollary to Theorem 7.14. 


8. Let T be a linear operator on a finite-dimensional vector space, and let 
p(t) be the minimal polynomial of T. Prove the following results. 


(a) T is invertible if and only if p(0) £0. 
(b) If T is invertible and p(t) =¢” + ap,_1t"-1+---+ ait + apo, then 


1 
ToS (TF ay TF? + HQT + ayl). 
ao 
9. Let T be a diagonalizable linear operator on a finite-dimensional vector 
space V. Prove that V is a T-cyclic subspace if and only if each of the 
eigenspaces of T is one-dimensional. 


10. Let T be a linear operator on a finite-dimensional vector space V, and 
suppose that W is a T-invariant subspace of V. Prove that the minimal 
polynomial of Tw divides the minimal polynomial of T. 


11. Let g(t) be the auxiliary polynomial associated with a homogeneous lin- 
ear differential equation with constant coefficients (as defined in Section 
2.7), and let V denote the solution space of this differential equation. 
Prove the following results. 


(a) Visa D-invariant subspace, where D is the differentiation operator 
on C®. 

(b) The minimal polynomial of Dy (the restriction of D to V) is g(t). 

(c) If the degree of g(t) is n, then the characteristic polynomial of Dy 
is (—1)"g(t). 

Hint: Use Theorem 2.32 (p. 135) for (b) and (c). 


12. Let D be the differentiation operator on P(R), the space of polynomials 
over R. Prove that there exists no polynomial g(t) for which g(D) = To. 
Hence D has no minimal polynomial. 


13. Let T be a linear operator on a finite-dimensional vector space, and 
suppose that the characteristic polynomial of T splits. Let 1, A2,..., Ax 
be the distinct eigenvalues of T, and for each 7 let p; be the order of the 
largest Jordan block corresponding to »; in a Jordan canonical form of 
T. Prove that the minimal polynomial of T is 


(t — Ag)P*(E — Ap) P2 ++ (E— Ay )P*. 


The following exercise requires knowledge of direct sums (see Section 5.2). 
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14. Let T be linear operator on a finite-dimensional vector space V, and 
let W, and W2 be T-invariant subspaces of V such that V = W, @ Wo. 
Suppose that pi(t) and po(t) are the minimal polynomials of Tw, and 
Tw,, respectively. Prove or disprove that p;(t)po(t) is the minimal 
polynomial of T. 


Exercise 15 uses the following definition. 


Definition. Let T be a linear operator on a finite-dimensional vector 
space V, and let x be a nonzero vector in V. The polynomial p(t) is called 
a T-annihilator of x if p(t) is a monic polynomial of least degree for which 
p(T)(x) = 0. 


15.1 Let T be a linear operator on a finite-dimensional vector space V, and 
let x be a nonzero vector in V. Prove the following results. 


(a) The vector x has a unique T-annihilator. 

(b) The T-annihilator of x divides any polynomial g(t) for which 
g(T) = To. 

(c) If p(t) is the T-annihilator of z and W is the T-cyclic subspace 
generated by x, then p(t) is the minimal polynomial of Tw, and 
dim(W) equals the degree of p(t). 

(d) The degree of the T-annihilator of x is 1 if and only if x is an 
eigenvector of T. 


16. T be a linear operator on a finite-dimensional vector space V, and let 
W, be a T-invariant subspace of V. Let 2 € V such that « ¢ W,. Prove 
the following results. 


(a) There exists a unique monic polynomial g;(t) of least positive de- 
gree such that gi(T)(a) € W1. 

(b) If A(t) is a polynomial for which h(T)(x) € Wi, then gi(t) divides 
h(t). 

(c) gi(t) divides the minimal and the characteristic polynomials of T. 

(d) Let We be a T-invariant subspace of V such that W2 C Wy, and 
let go(t) be the unique monic polynomial of least degree such that 
g2(T)(a) € We. Then gi(t) divides ga(t). 


7.4" THE RATIONAL CANONICAL FORM 


Until now we have used eigenvalues, eigenvectors, and generalized eigenvec- 
tors in our analysis of linear operators with characteristic polynomials that 
split. In general, characteristic polynomials need not split, and indeed, oper- 
ators need not have eigenvalues! However, the unique factorization theorem 
for polynomials (see Appendix E) guarantees that the characteristic polyno- 
mial f(t) of any linear operator T on an n-dimensional vector space factors 
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uniquely as 


FO) = ("Gr ))™ (2) > (Gat), 


where the ¢;(t)’s (1 <i < k) are distinct irreducible monic polynomials and 
the n,;’s are positive integers. In the case that f(t) splits, each irreducible 
monic polynomial factor is of the form ¢;(t) = t—;, where 4; is an eigenvalue 
of T, and there is a one-to-one correspondence between eigenvalues of T and 
the irreducible monic factors of the characteristic polynomial. In general, 
eigenvalues need not exist, but the irreducible monic factors always exist. In 
this section, we establish structure theorems based on the irreducible monic 
factors of the characteristic polynomial instead of eigenvalues. 

In this context, the following definition is the appropriate replacement for 
eigenspace and generalized eigenspace. 


Definition. Let T be a linear operator on a finite-dimensional vector 
space V with characteristic polynomial 


f(t) = (-D" (bit) (G2(8))" + (Pat) 


where the ¢;(t)’s (1 <7 < k) are distinct irreducible monic polynomials and 
the n;’s are positive integers. For 1 <i<k, we define the subset Kg, of V by 


Ky, ={x € V: (¢:(T))?(x) = 0 for some positive integer p}. 


We show that each Kg, is a nonzero T-invariant subspace of V. Note that 
if o;(t) = t — is of degree one, then Kg, is the generalized eigenspace of T 
corresponding to the eigenvalue 4. 

Having obtained suitable generalizations of the related concepts of eigen- 
value and eigenspace, our next task is to describe a canonical form of a linear 
operator suitable to this context. The one that we study is called the rational 
canonical form. Since a canonical form is a description of a matrix represen- 
tation of a linear operator, it can be defined by specifying the form of the 
ordered bases allowed for these representations. 

Here the bases of interest naturally arise from the generators of certain 
cyclic subspaces. For this reason, the reader should recall the definition of 
a T-cyclic subspace generated by a vector and Theorem 5.22 (p. 315). We 
briefly review this concept and introduce some new notation and terminology. 

Let T be a linear operator on a finite-dimensional vector space V, and let 
x be a nonzero vector in V. We use the notation C, for the T-cyclic subspace 
generated by x. Recall (Theorem 5.22) that if dim(C,) = k, then the set 


{x, T(x), T?(z),... , TPH (a)} 


is an ordered basis for C,. To distinguish this basis from all other ordered 
bases for C,, we call it the T-cyclic basis generated by z and denote it by 
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B,. Let A be the matrix representation of the restriction of T to C, relative 
to the ordered basis @,. Recall from the proof of Theorem 5.22 that 


0 0 --- O a9 

1 0 Be 0 —a, 
A= 0 1 --- 0 —ag ; 

0 0 1 —QAk-1 


where 
aor +a, T(x) +--+ + ag_1T* (a) + T*(x) = 0. 
Furthermore, the characteristic polynomial of A is given by 


det(A — tI) = (—1)*(ao + ait + +++ + ag_it® 1 + #*). 


The matrix A is called the companion matrix of the monic polynomial 
h(t) = ag + ayt +--+» + ap_yt*-! + t*. Every monic polynomial has a com- 
panion matrix, and the characteristic polynomial of the companion matrix of 
a monic polynomial g(t) of degree k is equal to (—1)*g(t). (See Exercise 19 
of Section 5.4.) By Theorem 7.15 (p. 519), the monic polynomial A(t) is also 
the minimal polynomial of A. Since A is the matrix representation of the 
restriction of T to C,, h(t) is also the minimal polynomial of this restriction. 
By Exercise 15 of Section 7.3, h(t) is also the T-annihilator of «. 

It is the object of this section to prove that for every linear operator T 
on a finite-dimensional vector space V, there exists an ordered basis @ for V 
such that the matrix representation [T]g is of the form 


Cy Ob a 20 
oy 6 ae ae 6) 
O MO? aes SOE 


where each C; is the companion matrix of a polynomial (#(t))™ such that ¢(t) 
is a monic irreducible divisor of the characteristic polynomial of T and m is 
a positive integer. A matrix representation of this kind is called a rational 
canonical form of T. We call the accompanying basis a rational canonical 
basis for T. 

The next theorem is a simple consequence of the following lemma, which 
relies on the concept of T-annihilator, introduced in the Exercises of Sec- 
tion 7.3. 


Lemma. Let T be a linear operator on a finite-dimensional vector space 
V, let x be a nonzero vector in V, and suppose that the T-annihilator of x 
is of the form ((t))? for some irreducible monic polynomial $(t). Then ¢(t) 
divides the minimal polynomial of T, and x € Kg. 
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Proof. By Exercise 15(b) of Section 7.3, (#(t))” divides the minimal poly- 
nomial of T. Therefore ¢(t) divides the minimal polynomial of T. Further- 
more, x € Kg by the definition of Kg. | 


Theorem 7.17. Let T be a linear operator on a finite-dimensional vector 
space V, and let 3 be an ordered basis for V. Then ( is a rational canonical 
basis for T if and only if @ is the disjoint union of T-cyclic bases 3,,,, where 
each v; lies in Kg for some irreducible monic divisor $(t) of the characteristic 
polynomial of T. 


Proof. Exercise. | 


Example 1 


Suppose that T is a linear operator on R® and 
B = {v1, U2, U3, U4, U5, V6, U7; ug} 


is a rational canonical basis for T such that 


oOCcoCcoCcoOoOrF Ww 


| 
ogonoHicoo 


oocooooro 
ooo CO RF oreo Oo 
oooreoeaqgqco 
SO Oe Oso OO: ©. 
rFoOooCceecececeeoo 
a eee) 


is a rational canonical form of T. In this case, the submatrices C, C2, and 
C3 are the companion matrices of the polynomials ¢;(t), (¢2(t))?, and ¢2(t), 
respectively, where 


di(t)=t?-t+3 and g(t) =? +1. 


In the context of Theorem 7.17, @ is the disjoint union of the T-cyclic bases; 
that is, 


B = Bu, U Bus U Buz 


= {v1, v2} U {v3, V4, U5, V6} U {07, Us}. 


By Exercise 40 of Section 5.4, the characteristic polynomial f(t) of T is the 
product of the characteristic polynomials of the companion matrices: 


f(t) = $1(#)(¢2(t))?¢2(t) = o1(t)(da(t))?. 
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The rational canonical form C' of the operator T in Example 1 is con- 
structed from matrices of the form C;, each of which is the companion matrix 
of some power of a monic irreducible divisor of the characteristic polynomial 
of T. Furthermore, each such divisor is used in this way at least once. 

In the course of showing that every linear operator T on a finite dimen- 
sional vector space has a rational canonical form C’, we show that the com- 
panion matrices C; that constitute C' are always constructed from powers of 
the monic irreducible divisors of the characteristic polynomial of T. A key 
role in our analysis is played by the subspaces Ky, where ¢(¢) is an irreducible 
monic divisor of the minimal polynomial of T. Since the minimal polynomial 
of an operator divides the characteristic polynomial of the operator, every ir- 
reducible divisor of the former is also an irreducible divisor of the latter. We 
eventually show that the converse is also true; that is, the minimal polynomial 
and the characteristic polynomial have the same irreducible divisors. 

We begin with a result that lists several properties of irreducible divisors 
of the minimal polynomial. The reader is advised to review the definition o 
T-annihilator and the accompanying Exercise 15 of Section 7.3. 


Theorem 7.18. Let T be a linear operator on a finite-dimensional vector 
space V, and suppose that 


p(t) = (br (t))"™* (a(t)? + + (Pe (4) 


is the minimal polynomial of T, where the ¢,(t)’s (1 <i < k) are the distinct 
irreducible monic factors of p(t) and the m,’s are positive integers. Then the 
following statements are true. 
(a) Ky, is a nonzero T-invariant subspace of V for each i. 
(b) If x is a nonzero vector in some Kg,, then the T-annihilator of x is of 
the form (¢;(t))? for some integer p. 
(c) Kg; q Ko, = {0} foriA#Aj.. 
(d) Kg, is invariant under ¢;(T) for i 4 j, and the restriction of ¢;(T) to 
Kg, is one-to-one and onto. 
(e) Kg, = N((¢i(T))™) for each i. 


Proof. If k = 1, then (a), (b), and (e) are obvious, while (c) and (d) are 
vacuously true. Now suppose that k > 1. 

(a) The proof that Ky, is a T-invariant subspace of V is left as an exer- 
cise. Let f(t) be the polynomial obtained from p(t) by omitting the factor 
(;(t))”*. To prove that Ky, is nonzero, first observe that f;(t) is a proper di- 
visor of p(t); therefore there exists a vector z € V such that x = f;(T)(z) 4 0. 
Then x € Kg, because 


(@i(T))"™ (@) = (6:(T))"™ F(T) (2) = P(T) (2) = 8. 


(b) Assume the hypothesis. Then (¢;(T))?(x) = 0 for some positive in- 
teger g. Hence the T-annihilator of x divides (¢;(t))? by Exercise 15(b) of 
Section 7.3, and the result follows. 
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(c) Assume 7 4 j. Let @ € Ky, NKg,, and suppose that « 4 0. By (b), the 
T-annihilator of x is a power of both ¢;(t) and ¢,(t). But this is impossible 
because ¢;(t) and ¢,(t) are relatively prime (see Appendix E). We conclude 
that «= 0. 

(d) Assume i # j. Since Kg, is T-invariant, it is also $,(T)-invariant. 
Suppose that ¢;(T)(#) = 0 for some x € Kg,. Then a € Kg, 1 Ky, = {0} 
by (c). Therefore the restriction of #;(T) to Kg, is one-to-one. Since V is 
finite-dimensional, this restriction is also onto. 

(e) Suppose that 1 <i < k. Clearly, N((¢;(T))™) C Kg,. Let f(t) be the 
polynomial defined in (a). Since f;(t) is a product of polynomials of the form 
¢;(t) for 7 # 1, we have by (d) that the restriction of f;(T) to Kg, is onto. 
Let « € Kg,. Then there exists y € Kg, such that f;(T)(y) =a. Therefore 


((oi(T))™)(@) = ((#i(T))™) F(T) = vCT)(y) = 0, 
and hence a € N((i(T))"). Thus Ko, = N((os(T))""). i 


Since a rational canonical basis for an operator T is obtained from a union 
of T-cyclic bases, we need to know when such a union is linearly independent. 
The next major result, Theorem 7.19, reduces this problem to the study of 
T-cyclic bases within Ky, where ¢(t) is an irreducible monic divisor of the 
minimal polynomial of T. We begin with the following lemma. 


Lemma. Let T be a linear operator on a finite-dimensional vector space 
V, and suppose that 


p(t) = (Gr (t))™* (G2(t))? +» (Pe (t))™* 


is the minimal polynomial of T, where the ¢;’s (1 < i < k) are the dis- 
tinct irreducible monic factors of p(t) and the m,’s are positive integers. For 
1<i<hk, let vu; € Kg, be such that 


Up +tvgt--> fu, =O. (2) 
Then v; = 0 for all i. 


Proof. The result is trivial if k = 1, so suppose that k > 1. Consider 
any 7. Let f;(t) be the polynomial obtained from p(t) by omitting the factor 
(g;(t))”*. As a consequence of Theorem 7.18, f;(T) is one-to-one on Kg,, and 
fi(T)(v;) = 0 fori # j. Thus, applying f;(T) to (2), we obtain f;(T)(vi) = 0, 
from which it follows that v; = 0. | 


Theorem 7.19. Let T be a linear operator on a finite-dimensional vector 
space V, and suppose that 


p(t) = (1 (t))™* (G2(t))"? +» (Pe (t))* 
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is the minimal polynomial of T, where the ¢;’s (1 < i < k) are the dis- 
tinct irreducible monic factors of p(t) and the m,’s are positive integers. For 
1<i<hk, let S; be a linearly independent subset of Kg,. Then 
(a) 5:05; = 6 fori #j 
(b) S; US2U--- US, is linearly independent. 
Proof. If k = 1, then (a) is vacuously true and (b) is obvious. Now 
suppose that & > 1. Then (a) follows immediately from Theorem 7.18(c). 


Furthermore, the proof of (b) is identical to the proof of Theorem 5.8 (p. 267) 
with the eigenspaces replaced by the subspaces Kg,. 


In view of Theorem 7.19, we can focus on bases of individual spaces of 
the form K,(t), where ¢(¢) is an irreducible monic divisor of the minimal 
polynomial of T. The next several results give us ways to construct bases for 
these spaces that are unions of T-cyclic bases. These results serve the dual 
purposes of leading to the existence theorem for the rational canonical form 
and of providing methods for constructing rational canonical bases. 

For Theorems 7.20 and 7.21 and the latter’s corollary, we fix a linear 
operator T on a finite-dimensional vector space V and an irreducible monic 
divisor ¢(t) of the minimal polynomial of T. 


Theorem 7.20. Let v1, v2,...,u% be distinct vectors in Kg such that 
Sy = By, U By, U-++U Bou, 


is linearly independent. For each i, choose w; € V such that $(T)(w;) = v. 
Then 


So = Bu, U Bw. U+-+U Bu, 
is also linearly independent. 


Proof. Consider any linear combination of vectors in Sj that sums to zero, 
say, 


55 aT! (w;) = 0. (3) 


t=1 7=0 


For each i, let f;(t) be the polynomial defined by 
j=0 


Then (3) can be rewritten as 
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Apply ¢(T) to both sides of (4) to obtain 


k 


k 
DADA CT) (wi) = DFT OT) (ws) = DFT) (0s) = 0. 


i=l 


This last sum can be rewritten as a linear combination of the vectors in $1 
so that each f;(T)(v;) is a linear combination of the vectors in (,,. Since S; 
is linearly independent, it follows that 


fi(T)(v;) = 0 for all i. 


Therefore the T-annihilator of v; divides f;(t) for all i. (See Exercise 15 of 
Section 7.3.) By Theorem 7.18(b), ¢(t) divides the T-annihilator of v;, and 
hence ¢(¢) divides f;(t) for all 7. Thus, for each 7, there exists a polynomial 
g(t) such that f;(t) = gi(t)@(t). So (4) becomes 


ii k 
Dy gi(T)O(T) (wy) = ds gi(T)(v;) = 0. 


Again, linear independence of S; requires that 
fi(T) (wi) = gi(T) (vi) = 0 for all i. 


But f;(T)(w;) is the result of grouping the terms of the linear combination 
in (3) that arise from the linearly independent set 3,,,. We conclude that for 
each 7, aj; = 0 for all 7. Therefore 5S is linearly independent. | 


We now show that Kg has a basis consisting of a union of T-cycles. 


Lemma. Let W be a T-invariant subspace of Kg, and let 3 be a basis for 
W. Then the following statements are true. 
(a) Suppose that « € N(¢(T)), but « ¢ W. Then 3 U 6, is linearly inde- 
pendent. 
(b) For some wy, w2,...,ws in N(d(T)), 8 can be extended to the linearly 
independent set 


PSB Bo Bai Puy 
whose span contains N(¢(T)). 


Proof. (a) Let 3 = {v1,v2,..., ve}, and suppose that 


k d-1 
So av; +2 = 0 and a= > 6,75), 
i=1 j=0 
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where d is the degree of g(t). Then z € C, MW, and hence C, C C, NW. 
Suppose that z #4 0. Then z has ¢(t) as its T-annihilator, and therefore 


d= dim(C,) < dim(C, NW) < dim(C,) = d. 


It follows that C,NW = C,, and consequently « € W, contrary to hypothesis. 
Therefore z = 0, from which it follows that b; = 0 for all 7. Since @ is 
linearly independent, it follows that a; = 0 for all 7. Thus 8 U (, is linearly 
independent. 

(b) Suppose that W does not contain N(¢(T)). Choose a vector wy € 
N(@(t)) that is not in W. By (a), 6, = 8 U By, is linearly independent. 
Let W, = span(@). If W; does not contain N(@(t)), choose a vector we in 
N(¢(¢)), but not in Wy, so that Bz = 61UBy, = BUBw, Uw, is linearly inde- 
pendent. Continuing this process, we eventually obtain vectors w1, wo,..., Ws 
in N(@(T)) such that the union 


0 = 265 Wp een Bye 
is a linearly independent set whose span contains N(#(T)). | 


Theorem 7.21. If the minimal polynomial of T is of the form p(t) = 
(d(t))™, then there exists a rational canonical basis for T. 


Proof. The proof is by mathematical induction on m. Suppose that m = 1. 
Apply (b) of the lemma to W = {0} to obtain a linearly independent subset 
of V of the form @,, U Gy), U-++-U By,, whose span contains N(#(T)). Since 
V = N(¢(T)), this set is a rational canonical basis for V. 

Now suppose that, for some integer m > 1, the result is valid whenever the 
minimal polynomial of T is of the form (¢(T))*, where k < m, and assume 
that the minimal polynomial of T is p(t) = ((t))™. Let r = rank(¢(T)). 
Then R(#(T)) is a T-invariant subspace of V, and the restriction of T to this 
subspace has (¢(t))’~+ as its minimal polynomial. Therefore we may apply 
the induction hypothesis to obtain a rational canonical basis for the restriction 
of T to R(T). Suppose that v1, v2,...,v% are the generating vectors of the 
T-cyclic bases that constitute this rational canonical basis. For each 7, choose 
w; in V such that v; = $(T)(w;). By Theorem 7.20, the union @ of the sets Bw, 
is linearly independent. Let W = span(3). Then W contains R(¢(T)). Apply 
(b) of the lemma and adjoin additional T-cyclic bases By, .., Bwy4o5+++> Bw, 
to @, if necessary, where w; is in N(¢(T)) for i > k, to obtain a linearly 
independent set 


Oba U By Wee UB eRe, 


whose span W’ contains both W and N(¢(T)). 
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We show that W/ = V. Let U denote the restriction of ¢(T) to W’, which 
is ¢(T)-invariant. By the way in which W’ was obtained from R(¢(T)), it 
follows that R(U) = R(¢(T)) and N(U) = N(@(T)). Therefore 


dim(W’) = rank(U) + nullity(U) 
= rank(@(T)) + nullity(¢(T)) 
= dim(V). 


Thus W’ = V, and #9’ is a rational canonical basis for T. | 
Corollary. Ky has a basis consisting of the union of T-cyclic bases. 
Proof. Apply Theorem 7.21 to the restriction of T to Kg. | 


We are now ready to study the general case. 


Theorem 7.22. Every linear operator on a finite-dimensional vector space 
has a rational canonical basis and, hence, a rational canonical form. 


Proof. Let T be a linear operator on a finite-dimensional vector space V, 
and let p(t) = (¢1(t))"" (be(t))'2 --- (bp (t))”* be the minimal polynomial 
of T, where the ¢;(t)’s are the distinct irreducible monic factors of p(t) and 
m; > 0 for all 7. The proof is by mathematical induction on k. The case 
k = 1 is proved in Theorem 7.21. 

Suppose that the result is valid whenever the minimal polynomial contains 
fewer than & distinct irreducible factors for some k > 1, and suppose that p(t) 
contains k distinct factors. Let U be the restriction of T to the T-invariant 
subspace W = R((¢x(T)™*), and let q(t) be the minimal polynomial of U. 
Then q(t) divides p(t) by Exercise 10 of Section 7.3. Furthermore, ¢,(t) does 
not divide q(t). For otherwise, there would exist a nonzero vector « € W such 
that $,,(U)(“) = 0 and a vector y € V such that « = (¢;(T))"*(y). It follows 
that (¢,(T))™**t1(y) = 0, and hence y € Kg, and « = (d%(T))"*(y) = 
0 by Theorem 7.18(e), a contradiction. Thus q(t) contains fewer than k 
distinct irreducible divisors. So by the induction hypothesis, U has a rational 
canonical basis 3) consisting of a union of U-cyclic bases (and hence T-cyclic 
bases) of vectors from some of the subspaces Ky,, 1 < i < k—1. By the 
corollary to Theorem 7.21, Ky, has a basis (2 consisting of a union of T- 
cyclic bases. By Theorem 7.19, @, and (2 are disjoint, and G = (}; U Bo is 
linearly independent. Let s denote the number of vectors in @. Then 


s = dim(R((¢x(T))’"*)) + dim(Kg, ) 
= rank((¢,(T))*) + nullity((¢;,(T))’"*) 


=n. 


We conclude that ( is a basis for V. Therefore (@ is a rational canonical basis, 
and T has a rational canonical form. 
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In our study of the rational canonical form, we relied on the minimal 
polynomial. We are now able to relate the rational canonical form to the 
characteristic polynomial. 


Theorem 7.23. Let T be a linear operator on an n-dimensional vector 
space V with characteristic polynomial 


F(t) = (—D)" (or ())"* (2(t))"? ++ (u(t) 


where the ¢;(t)’s (1 <7 < k) are distinct irreducible monic polynomials and 
the n,’s are positive integers. Then the following statements are true. 
(a) 1(t), b2(#),... , x(t) are the irreducible monic factors of the minimal 
polynomial. 
(b) For each i, dim(Ky,) = dni, where d; is the degree of ¢;(t). 
(c) If @ is a rational canonical basis for T, then 8; = 8 Kg, is a basis for 
Ky, for each 1. 
(d) If 7; is a basis for Ky, for each i, then y = 71 U7y2U---U 4% is a basis 
for V. In particular, if each y; is a disjoint union of T-cyclic bases, then 
y is a rational canonical basis for T. 


Proof. (a) By Theorem 7.22, T has a rational canonical form C. By 
Exercise 40 of Section 5.4, the characteristic polynomial of C, and hence of 
T, is the product of the characteristic polynomials of the companion matrices 
that compose C. Therefore each irreducible monic divisor ¢,;(t) of f(t) divides 
the characteristic polynomial of at least one of the companion matrices, and 
hence for some integer p, (;(t))? is the T-annihilator of a nonzero vector of 
V. We conclude that (¢;(t))?, and so ¢,(t), divides the minimal polynomial 
of T. Conversely, if é(¢) is an irreducible monic polynomial that divides the 
minimal polynomial of T, then ¢(¢) divides the characteristic polynomial of 
T because the minimal polynomial divides the characteristic polynomial. 

(b), (c), and (d) Let C = [T]g, which is a rational canonical form of T. 
Consider any i, (1 <i <k). Since f(t) is the product of the characteristic 
polynomials of the companion matrices that compose C, we may multiply 
those characteristic polynomials that arise from the T-cyclic bases in (; to 
obtain the factor (¢;(t))” of f(t). Since this polynomial has degree n;d;, and 
the union of these bases is a linearly independent subset §; of Kg,, we have 

k 
Furthermore, n = So dini, because this sum is equal to the degree of f(t). 
Now let s denote the meuber of vectors in y. By Theorem 7.19, ¥ is linearly 
independent, and therefore 


k k 
n= So din < S| dim(Kg,) =s<n. 
i=l i=l 
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Hence n = s, and dn; = dim(Kg,) for all i. It follows that 7 is a basis for V 
and (3; is a basis for Kg, for each 2%. i 


Uniqueness of the Rational Canonical Form 


Having shown that a rational canonical form exists, we are now in a po- 
sition to ask about the extent to which it is unique. Certainly, the rational 
canonical form of a linear operator T can be modified by permuting the T- 
cyclic bases that constitute the corresponding rational canonical basis. This 
has the effect of permuting the companion matrices that make up the rational 
canonical form. As in the case of the Jordan canonical form, we show that 
except for these permutations, the rational canonical form is unique, although 
the rational canonical bases are not. 

To simplify this task, we adopt the convention of ordering every rational 
canonical basis so that all the T-cyclic bases associated with the same irre- 
ducible monic divisor of the characteristic polynomial are grouped together. 
Furthermore, within each such grouping, we arrange the T-cyclic bases in 
decreasing order of size. Our task is to show that, subject to this order, the 
rational canonical form of a linear operator is unique up to the arrangement 
of the irreducible monic divisors. 

As in the case of the Jordan canonical form, we introduce arrays of dots 
from which we can reconstruct the rational canonical form. For the Jordan 
canonical form, we devised a dot diagram for each eigenvalue of the given 
operator. In the case of the rational canonical form, we define a dot diagram 
for each irreducible monic divisor of the characteristic polynomial of the given 
operator. A proof that the resulting dot diagrams are completely determined 
by the operator is also a proof that the rational canonical form is unique. 

In what follows, T is a linear operator on a finite-dimensional vector space 
with rational canonical basis 3; ¢(t) is an irreducible monic divisor of the char- 
acteristic polynomial of T; 8y,,Gu.,.--,; Av, are the T-cyclic bases of 6 that 
are contained in Ky; and d is the degree of @(t). For each 7, let (¢(t))?/ be the 
annihilator of v;. This polynomial has degree dp;; therefore, by Exercise 15 
of Section 7.3, G,, contains dp; vectors. Furthermore, py > pz = -+: = Dr 
since the T-cyclic bases are arranged in decreasing order of size. We define 
the dot diagram of ¢(t) to be the array consisting of k columns of dots with 
p; dots in the jth column, arranged so that the jth column begins at the top 
and terminates after p; dots. For example, if k = 3, p, = 4, pz = 2, and 
p3 = 2, then the dot diagram is 


Although each column of a dot diagram corresponds to a T-cyclic basis 
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Gy, in Kg, there are fewer dots in the column than there are vectors in the 
basis. 


Example 2 


Recall the linear operator T of Example 1 with the rational canonical basis 
@ and the rational canonical form C = [T]g. Since there are two irreducible 
monic divisors of the characteristic polynomial of T, ¢:(¢) = t? —t +3 and 
éo(t) = t? + 1, there are two dot diagrams to consider. Because ¢;(t) is 
the T-annihilator of v; and 6,, is a basis for Ky,, the dot diagram for ¢)(t) 
consists of a single dot. The other two T cyclic bases, 3,, and (,,, lie in Kg,. 
Since vs has T-annihilator (#2(t))? and v7 has T-annihilator @2(t), in the dot 
diagram of ¢2(t) we have p; = 2 and po = 1. These diagrams are as follows: 


Dot diagram for ¢1(t) Dot diagram for ¢2(t) 4 


In practice, we obtain the rational canonical form of a linear operator 
from the information provided by dot diagrams. This is illustrated in the 
next example. 


Example 3 


Let T be a linear operator on a finite-dimensional vector space over R, and 
suppose that the irreducible monic divisors of the characteristic polynomial 
of T are 


dit) =t—-1, do(t)=t?4+2, and ga(t)=e?+t41. 


Suppose, furthermore, that the dot diagrams associated with these divisors 
are as follows: 


Diagram for ¢;(t) Diagram for ¢2(t) Diagram for ¢3(t) 


Since the dot diagram for ¢1(t) has two columns, it contributes two companion 
matrices to the rational canonical form. The first column has two dots, and 
therefore corresponds to the 2 x 2 companion matrix of (¢1(t))? = (t — 1). 
The second column, with only one dot, corresponds to the 1 x 1 companion 
matrix of ¢,(t) =t— 1. These two companion matrices are given by 


a=(; = and C2 = (1). 


The dot diagram for $2(t) = t? +2 consists of two columns. each containing a 
single dot; hence this diagram contributes two copies of the 2 x 2 companion 
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matrix for ¢2(t), namely, 


0 -2 
C3 = Cy, = G ‘) : 
The dot diagram for ¢3(t) = t? +¢+ 1 consists of a single column with a 
single dot contributing the single 2 x 2 companion matrix 


0 -1 
Gis ¢ oH) 
Therefore the rational canonical form of T is the 9 x 9 matrix 


Cc; O O O O 


O Cy, O O O 
C=|!0 O C3; O O 
O O O % O 
O O O O CG; 

0 -1 0 0 O O 0 0 0 

1 2 0 0 O O 0 0 0 

0 0 1 0 0 0 0 0 0 

0 0 0 0 —2 0 O O 0 

= 0 0 0 1 0 0 O O 0 4 

0 0 0 O 0 0 -2 0 60 

0 0 0 O 0 1 0 0 0 

0 0 0 0 O O 90 0 -1 

0 60 0 0 O O 90 1 -l 


We return to the general problem of finding dot diagrams. As we did 
before, we fix a linear operator T on a finite-dimensional vector space and an 
irreducible monic divisor ¢(t) of the characteristic polynomial of T. Let U 
denote the restriction of the linear operator @(T) to Ky. By Theorem 7.18(d), 
U? = To for some positive integer g. Consequently, by Exercise 12 of Sec- 
tion 7.2, the characteristic polynomial of U is (—1)™t™, where m = dim(Kg). 
Therefore Kg is the generalized eigenspace of U corresponding to A = 0, and 
U has a Jordan canonical form. The dot diagram associated with the Jordan 
canonical form of U gives us a key to understanding the dot diagram of T 
that is associated with ¢(t). We now relate the two diagrams. 

Let @ be a rational canonical basis for T, and (,,, 3,,,..., 8», be the T- 
cyclic bases of 3 that are contained in Ky. Consider one of these T-cyclic 
bases 3,,, and suppose again that the T-annihilator of v; is (@(t))??. Then 
By, consists of dp; vectors in @. For 0 < i < d, let 7 be the cycle of 
generalized eigenvectors of U corresponding to \ = 0 with end vector T’(v;), 
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where T°(v;) = 6;. Then 
Fe = {(H(T) PT (09), (G(T) P97 T (05), (P(T))T"(v), T' (v4). 
By Theorem 7.1 (p. 485), 7% is a linearly independent subset of C,,. Now let 
a; = yUWU-:-UYa-1. 
Notice that a; contains p;d vectors. 


Lemma 1. a; is an ordered basis for Cy,- 


Proof. The key to this proof is Theorem 7.4 (p. 487). Since a; is the union 
of cycles of generalized eigenvectors of U corresponding to ’ = 0, it suffices 
to show that the set of initial vectors of these cycles 


{(o(T))?9-* (vj), (@(T))?7- 7 T (v4), --- (@(T)) P94 (05) 


is linearly independent. Consider any linear combination of these vectors 
ag($(T))P9—* (uj) + a1 (G(T) )P?-*T (vj) + ++» + aaa (b(T))P7 7 T2 (04), 


where not all of the coefficients are zero. Let g(t) be the polynomial defined 
by g(t) = a9 + ayt +--+ aqg_,t¢!. Then g(t) is a nonzero polynomial of 
degree less than d, and hence (¢(t))?/~1g(t) is a nonzero polynomial with 
degree less than p;d. Since (¢(t))?s is the T-annihilator of v,;, it follows 
that (o(T))?i~'g(T)(v;) 4 0. Therefore the set of initial vectors is linearly 
independent. So by Theorem 7.4, a; is linearly independent, and the 7;’s are 
disjoint. Consequently, a; consists of p;d linearly independent vectors in C,,, 
which has dimension p;d. We conclude that a; is a basis for C,,. il 


Thus we may replace 3,, by a; as a basis for C,,. We do this for each 7 
to obtain a subset a = ay Uag:::U ax of Kg. 


Lemma 2. a is a Jordan canonical basis for Kg. 


Proof. Since 3, U By, U--+U By, is a basis for Kg, and since span(a;) = 
span(f,,) = C,,, Exercise 9 implies that a is a basis for Ky. Because a is 
a union of cycles of generalized eigenvectors of U, we conclude that a is a 
Jordan canonical basis. | 


We are now in a position to relate the dot diagram of T corresponding to 
@(t) to the dot diagram of U, bearing in mind that in the first case we are 
considering a rational canonical form and in the second case we are consider- 
ing a Jordan canonical form. For convenience, we designate the first diagram 
D,, and the second diagram Dj. For each j, the presence of the T-cyclic 
basis 6,, results in a column of p; dots in D;. By Lemma 1, this basis is 
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replaced by the union a; of d cycles of generalized eigenvectors of U, each of 
length p;, which becomes part of the Jordan canonical basis for U. In effect, 
a,; determines d columns each containing p; dots in Dz. So each column in 
D, determines d columns in Dg of the same length, and all columns in Dg are 
obtained in this way. Alternatively, each row in Dz has d times as many dots 
as the corresponding row in D;. Since Theorem 7.10 (p. 500) gives us the 
number of dots in any row of Dz, we may divide the appropriate expression 
in this theorem by d to obtain the number of dots in the corresponding row 
of D,. Thus we have the following result. 


Theorem 7.24. Let T be a linear operator on a finite-dimensional vector 
space V, let $(t) be an irreducible monic divisor of the characteristic poly- 
nomial of T of degree d, and let r; denote the number of dots in the ith row 
of the dot diagram for $(t) with respect to a rational canonical basis for T. 
Then 

1 


(a) m1 = 5[dim(V) — rank(4(T))] 


(b) r= Hfpank((6(T))*-¥) — rank((9(T)))] ford > 1. 


Thus the dot diagrams associated with a rational canonical form of an op- 
erator are completely determined by the operator. Since the rational canoni- 
cal form is completely determined by its dot diagrams, we have the following 
uniqueness condition. 


Corollary. Under the conventions described earlier, the rational canonical 
form of a linear operator is unique up to the arrangement of the irreducible 
monic divisors of the characteristic polynomial. 


Since the rational canonical form of a linear operator is unique, the poly- 
nomials corresponding to the companion matrices that determine this form 
are also unique. These polynomials, which are powers of the irreducible monic 
divisors, are called the elementary divisors of the linear operator. Since a 
companion matrix may occur more than once in a rational canonical form, 
the same is true for the elementary divisors. We call the number of such 
occurrences the multiplicity of the elementary divisor. 

Conversely, the elementary divisors and their multiplicities determine the 
companion matrices and, therefore, the rational canonical form of a linear 
operator. 


Example 4 
Let 


GB = {e* cos 2x, e” sin 2x, xe” cos 2x, xe” sin 2a} 
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be viewed as a subset of F(R, R), the space of all real-valued functions defined 
on R, and let V = span(). Then V is a four-dimensional subspace of F(R, R), 
and @ is an ordered basis for V. Let D be the linear operator on V defined by 
D(y) = y’, the derivative of y, and let A = [D]g. Then 


0 
1 
2 7 
1 
and the characteristic polynomial of D, and hence of A, is 
fH = —2t +5)". 
Thus ¢(t) = t? —2t+5 is the only irreducible monic divisor of f(t). Since ¢(t) 
has degree 2 and V is four-dimensional, the dot diagram for ¢(t) contains only 
two dots. Therefore the dot diagram is determined by r1, the number of dots 


in the first row. Because ranks are preserved under matrix representations, 
we can use A in place of D in the formula given in Theorem 7.24. Now 


(A) = 


oOoCc oO 
ooOoCc oO 


and so 
my = $[4—rank(4(A))] = 34-2) = 1. 


It follows that the second dot lies in the second row, and the dot diagram is 
as follows: 


Hence V is a D-cyclic space generated by a single function with D-annihilator 
(¢(t))?. Furthermore, its rational canonical form is given by the companion 
matrix of (¢(t))? = t4 — 4t3 + 14¢? — 20t + 25, which is 


0 0 0 —25 
1 0 0 20 
0 1 0 —14 
0 0 1 4 


Thus (¢(t))? is the only elementary divisor of D, and it has multiplicity 1. For 
the cyclic generator, it suffices to find a function g in V for which ¢(D)(g) 4 0. 
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Since ¢(A)(e3) 4 0, it follows that 6(D)(xe” cos2x~) # 0; therefore g(x) = 
xe” cos 2x can be chosen as the cyclic generator. Hence 


Bg = {xe* cos 2x, D(ae* cos 2x), D*(xe® cos 2x), D?(xe” cos 2x) } 


is a rational canonical basis for D. Notice that the function h defined by 
h(a) = xe® sin 2x can be chosen in place of g. This shows that the rational 
canonical basis is not unique. 


It is convenient to refer to the rational canonical form and elementary 
divisors of a matrix, which are defined in the obvious way. 


Definitions. Let A € Maxn(F). The rational canonical form of 
A is defined to be the rational canonical form of L4. Likewise, for A, the 
elementary divisors and their multiplicities are the same as those of Ly. 


Let A be an n x n matrix, let C be a rational canonical form of A, and let 
@ be the appropriate rational canonical basis for L4. Then C' = [Ly],, and 
therefore A is similar to C. In fact, if Q is the matrix whose columns are the 
vectors of 3 in the same order, then Q-!AQ = C. 


Example 5 


For the following real matrix A, we find the rational canonical form C of A 
and a matrix Q such that Q~!AQ = C. 


0 2 0 -6 2 
1 =2° 0 0 2 
A=]|1 0 1 -3 2 
1 -2 1 -1 2 
1 -4 3 -3 4 
The characteristic polynomial of A is f(t) = —(t? + 2)?(t — 2); therefore 


oi(t) = t? +2 and ¢2(t) = t —2 are the distinct irreducible monic divisors of 
f(t). By Theorem 7.23, dim(Kg,) = 4 and dim(Kg,) = 1. Since the degree 
of $1 (#) is 2, the total number of dots in the dot diagram of ¢1(t) is 4/2 = 2, 
and the number of dots r; in the first row is given by 


ry = $[dim(R®) — rank(@,(A))] 
= 3[5 —rank(A? + 21) 
= 3[5-1] =2. 


Thus the dot diagram of ¢1(t) is 
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and each column contributes the companion matrix 


0 -2 

1 0 
for ¢1(t) = t? + 2 to the rational canonical form C. Consequently ¢;(t) is an 
elementary divisor with multiplicity 2. Since dim(Kg,) = 1, the dot diagram 
of @2(t) = t — 2 consists of a single dot, which contributes the 1 x 1 matrix 


(2). Hence ¢2(t) is an elementary divisor with multiplicity 1. Therefore the 
rational canonical form C is 


Or 2) Or. OF 0 
mae 0 0 
Cc=|0 oOf0 —2]0 
0 o]1 ojo 
000 Of2 


We can infer from the dot diagram of $1 (¢) that if @ is a rational canonical 
basis for Ly, then 8M Kg, is the union of two cyclic bases G,, and ,,, where 
vy, and v2 each have annihilator ¢;(t). It follows that both v1 and vg lie in 
N(¢i(La)). It can be shown that 


1 0 0 0 
0 1 0 0 
8 (eon ee (ee 
o} lo 1 0 
o/ \o 0 1 


is a basis for N(¢,(La)). Setting vj = e1, we see that 


Avy = 


PR Rr © 


Next choose v2 in Kg, = N(¢(La)), but not in the span of 8,, = {v1, Avr}. 
For example, v2 = eg. Then it can be seen that 


2 
—2 
Avo => 0 ; 
—2 
—4 


and 3,, U By. is a basis for Kg,. 
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Since the dot diagram of ¢2(t) = t—2 consists of a single dot, any nonzero 


vector in Kg, is an eigenvector of A corresponding to the eigenvalue \ = 2. 
For example, choose 


0 
1 
V3 = 1 
1 
2 


By Theorem 7.23, 3 = {v1, Avi, v2, Av2, v3} is a rational canonical basis for 
L4. So setting 


0:49) “20 
es es ee a 
O= 10 1.6. oO Als 
Gey 08 ke 4 
GR! ae 2 


we have Q-'AQ=C. 


Example 6 


For the following matrix A, we find the rational canonical form C’ and a 
matrix Q such that Q~!AQ = C: 


oococnw 
SON rF 
OoOoNrF OO 
NOOO 


Since the characteristic polynomial of A is f(t) = (t—2)*, the only irreducible 
monic divisor of f(t) is ¢(t) = t — 2, and so Kg = R*. In this case, $(t) has 
degree 1; hence in applying Theorem 7.24 to compute the dot diagram for 
g(t), we obtain 


ry =4—rank(¢(A)) = 4-2 = 2, 
rg = rank(#(A)) — rank((#(A))?) = 2—1=1, 


and 


rg = rank((#(A))*) — rank((¢(A))*) = 1-0 = 1, 


where 1; is the number of dots in the 7th row of the dot diagram. Since there 
are dim(R*) = 4 dots in the diagram, we may terminate these computations 
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with rz. Thus the dot diagram for A is 


Since (t — 2)? has the companion matrix 


0 0 8 
1 0 —-12 
0 1 6 


and (t — 2) has the companion matrix (2), the rational canonical form of A 
is given by 


0 0 8 | 0 
te 8G ch 45 
aed se ee 
0 0 0/2 


Next we find a rational canonical basis for L4. The preceding dot diagram 
indicates that there are two vectors v; and v2 in R* with annihilators (¢(t))? 
and ¢(t), respectively, and such that 


B= 1 Bis U Bu, } = {v1, Av,, A*v1, v2} 


is a rational canonical basis for L4. Furthermore, v; ¢ N((La — 2!)?), and 
vg € N(La — 2I). It can easily be shown that 


N(L4 — 21) = span({ez, e4}) 
and 
N((L4 — 2l)?) = span({e1, e2, e4}). 


The standard vector e3 meets the criteria for v,; so we set v; = e3. It follows 
that 


Av, = and Av, = 


CoNr © 
oe ke 


Next we choose a vector v2 € N(L4—2l) that is not in the span of @,,. Clearly, 
v2 = e4 satisfies this condition. Thus 


0 


oF KR 
ee ae =) 


0 
1 
2 ? 
0 


 S: 
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is a rational canonical basis for L4. 


Finally, let Q be the matrix whose columns are the vectors of 3 in the 
same order: 


OQHOSO 
ONHFO 
oR RR 
rHoOoO°o 


Then C=Q-1AQ. @ 


Direct Sums* 


The next theorem is a simple consequence of Theorem 7.23. 


Theorem 7.25 (Primary Decomposition Theorem). Let T be a 
linear operator on an n-dimensional vector space V with characteristic poly- 
nomial 


f(t) = (-D" (ort) (2(8))" ++ (Pat) 


where the ¢;(t)’s (1 <1 < k) are distinct irreducible monic polynomials and 
the n,’s are positive integers. Then the following statements are true. 
(a) V=Kg, Kg, @:-: O Kg,- 
(b) If T; (1 <i <k) is the restriction of T to Kg, and C; is the rational 
canonical form of T;, then C, 6 C2 ®@--- ® Cx is the rational canonical 
form of T. 


Proof. Exercise. | 


The next theorem is a simple consequence of Theorem 7.17. 


Theorem 7.26. Let T be a linear operator on a finite-dimensional vector 
space V. Then V is a direct sum of T-cyclic subspaces C,,, where each v; lies 
in Kg for some irreducible monic divisor ¢(t) of the characteristic polynomial 
of T. 


Proof. Exercise. | 


EXERCISES 


1. Label the following statements as true or false. 


a) Every rational canonical basis for a linear operator T is the union 
y P 
of T-cyclic bases. 
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(b) 


(c) 
(d) 
(e) 
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If a basis is the union of T-cyclic bases for a linear operator T, 
then it is a rational canonical basis for T. 

There exist square matrices having no rational canonical form. 

A square matrix is similar to its rational canonical form. 

For any linear operator T on a finite-dimensional vector space, any 


irreducible factor of the characteristic polynomial of T divides the 
minimal polynomial of T. 

Let ¢(t) be an irreducible monic divisor of the characteristic poly- 
nomial of a linear operator T. The dots in the diagram used to 
compute the rational canonical form of the restriction of T to Kg 
are in one-to-one correspondence with the vectors in a basis for 
Ky. 

If a matrix has a Jordan canonical form, then its Jordan canonical 
form and rational canonical form are similar. 


(f) 


(g) 


For each of the following matrices A € Majxn(F), find the rational 
canonical form C of A and a matrix Q € Mnxn(F) such that Q-1AQ = 


C. 
3 1 0 0 -1 
(a) A=[0 3 1] F=R (b) A=({ =) F=R 
0 0 3 
0 1 
(c) A=(} i) F=C 
0 -—7 14 -6 
1 -4 6 -3 
(d) A= ee a ae F=R 
0 -4 11 —5 
0 -4 12 -—7 
1 -1 3 -8 
(e) A= Oo 6c F=R 
0 -1 8 -5 


For each of the following linear operators T, find the elementary divisors, 
the rational canonical form C, and a rational canonical basis (3. 


(a) T is the linear operator on P3(R) defined by 
T(f(@)) = FO) — f’'(). 
(b) Let S = {sinz,cosz,xsinz,xcosx}, a subset of F(R, R), and let 
V = span(S). Define T to be the linear operator on V such that 
T(f) =f. 
(c) T is the linear operator on Mox2(R) defined by 
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T[A)= & i) oa 


(d) Let S = {sinasiny, sin xcos y, cos sin y, cos x cos y}, a subset of 
F(R x R,R), and let V = span($). Define T to be the linear 
operator on V such that 


Of(z,y) , OF(z,y) 


TN (ay) = Set + 


Let T be a linear operator on a finite-dimensional vector space V with 
minimal polynomial (¢(t))” for some positive integer m. 


(a) Prove that R(¢(T)) C N((¢(T))~?). 

(b) Give an example to show that the subspaces in (a) need not be 
equal. 

(c) Prove that the minimal polynomial of the restriction of T to 


R($(T)) equals (¢())""~*. 


Let T be a linear operator on a finite-dimensional vector space. Prove 
that the rational canonical form of T is a diagonal matrix if and only if 
T is diagonalizable. 


Let T be a linear operator on a finite-dimensional vector space V with 
characteristic polynomial f(t) = (—1)"¢1(t)¢2(t), where $1(t) and ¢2(t) 
are distinct irreducible monic polynomials and n = dim(V). 


(a) Prove that there exist v1,vg € V such that v; has T-annihilator 
g(t), ve has T-annihilator ¢2(t), and 6,, U By, is a basis for V. 

(b) Prove that there is a vector v3 € V with T-annihilator ¢1(t)d2(t) 
such that (,, is a basis for V. 

(c) Describe the difference between the matrix representation of T 
with respect to @,, U @,, and the matrix representation of T with 
respect to By. 


Thus, to assure the uniqueness of the rational canonical form, we re- 
quire that the generators of the T-cyclic bases that constitute a rational 
canonical basis have T-annihilators equal to powers of irreducible monic 
factors of the characteristic polynomial of T. 


Let T be a linear operator on a finite-dimensional vector space with 
minimal polynomial 


F(t) = (Gr ())™* (G2(E))"? + (Pe (t))™*, 


where the ¢;(t)’s are distinct irreducible monic factors of f(t). Prove 
that for each i, m,; is the number of entries in the first column of the 
dot diagram for ¢,(t). 
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Let T be a linear operator on a finite-dimensional vector space V. Prove 
that for any irreducible polynomial ¢(t), if d(T) is not one-to-one, then 
@(t) divides the characteristic polynomial of T. Hint: Apply Exercise 15 
of Section 7.3. 


Let V be a vector space and (1, G2,..., 8x be disjoint subsets of V whose 
union is a basis for V. Now suppose that 71,72,..-,Yx are linearly 
independent subsets of V such that span(7;) = span((;) for all 7. Prove 
that y1 UyqgU---U yx is also a basis for V. 


Let T be a linear operator on a finite-dimensional vector space, and 
suppose that ¢(t) is an irreducible monic factor of the characteristic 
polynomial of T. Prove that if #(t) is the T-annihilator of vectors 7 and 
y, then x € C, if and only if C,; = Cy. 


Exercises 11 and 12 are concerned with direct sums. 


11. 


12. 


Prove Theorem 7.25. 
Prove Theorem 7.26. 
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Appendices 


APPENDIX A_ SETS 


A set is a collection of objects, called elements of the set. If x is an element 
of the set A, then we write x € A; otherwise, we write x ¢ A. For example, 
if Z is the set of integers, then 3 € Z and 4 ZZ. 

One set that appears frequently is the set of real numbers, which we denote 
by R throughout this text. 

Two sets A and B are called equal, written A = B, if they contain exactly 
the same elements. Sets may be described in one of two ways: 


1. By listing the elements of the set between set braces { }. 
2. By describing the elements of the set in terms of some characteristic 
property. 
For example, the set consisting of the elements 1, 2, 3, and 4 can be 
written as {1,2,3,4} or as 


{x: x is a positive integer less than 5}. 


Note that the order in which the elements of a set are listed is immaterial; 
hence 


{1,2, 3,4} = {3,1, 2,4} = {1,3, 1, 4, 2}. 
Example 1 


Let A denote the set of real numbers between 1 and 2. Then A may be 
written as 


A={reER:l<ar<2}. ¢ 


A set B is called a subset of a set A, written B C A or A D B, if every 
element of B is an element of A. For example, {1,2,6} C {2,8,7,6,1}. If 
BCA, and BF A, then B is called a proper subset of A. Observe that 
A = B if and only if A C B and B C A, a fact that is often used to prove 
that two sets are equal. 

The empty set, denoted by @, is the set containing no elements. The 
empty set is a subset of every set. 

Sets may be combined to form other sets in two basic ways. The union 
of two sets A and B, denoted AU B, is the set of elements that are in A, or 
B, or both; that is, 


AUB={a:x2€ Aor xe B}. 
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The intersection of two sets A and B, denoted AN B, is the set of elements 
that are in both A and B; that is, 


AN B={a:x2€ Aand ze B}. 
Two sets are called disjoint if their intersection equals the empty set. 


Example 2 
Let A = {1,3,5} and B = {1,5,7,8}. Then 


AUB = {1,3,5,7,8} and ANB={1,5}. 
Likewise, if X = {1,2,8} and Y = {3,4,5}, then 

XUY ={1,2,3,4,5,8} and XNY=@. 
Thus X and Y are disjoint sets. 


The union and intersection of more than two sets can be defined analo- 
gously. Specifically, if A,, A2,..., A, are sets, then the union and intersec- 
tions of these sets are defined, respectively, by 


|) Ai = {a: 2 € Aj for some i = 1,2,...,n} 
i=1 

and 
() Ai = {a: cE A; for alli = 1,2,...,n}. 
i=1 


Similarly, if A is an index set and {A,: a € A} is a collection of sets, the 
union and intersection of these sets are defined, respectively, by 


[J Aa = {a: # € Aq for some a € A} 


aca 
and 
() Ag = {u: x € Ag for all a € A}. 
aca 
Example 3 


Let A= {a € R: a > 1}, and let 
—1 
Ay= {eR <asital 
a 


for each a € A. Then 


J Ae = {2 € R: 2 > -1} and () Ao = {a € R:0< 2 <2}. 4 
acl acl 
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By a relation on a set A, we mean a rule for determining whether or not, 
for any elements x and y in A, x stands in a given relationship to y. More 
precisely, a relation on A is a set S of ordered pairs of elements of A such 
that (x,y) € S' if and only if a stands in the given relationship to y. On the 
set of real numbers, for instance, “is equal to,” “is less than,” and “is greater 
than or equal to” are familiar relations. If S is a relation on a set A, we often 
write x ~ y in place of (x,y) € S. 

A relation S on a set A is called an equivalence relation on A if the 
following three conditions hold: 


1. For eacha€ A,a~a (reflexivity). 
2. Ifae~y,theny~a (symmetry). 
3. Ife~yandy~z,thenx~z (transitivity). 


For example, if we define x ~ y to mean that x — y is divisible by a fixed 
integer n, then ~ is an equivalence relation on the set of integers. 


APPENDIX B- FUNCTIONS 


If A and B are sets, then a function f from A to B, written f: A — B, is 
a rule that associates to each element x in A a unique element denoted f(x) 
in B. The element f(z) is called the image of x (under f), and z is called 
a preimage of f(x) (under f). If f: A — B, then A is called the domain 
of f, B is called the codomain of f, and the set { f(z): x € A} is called the 
range of f. Note that the range of f is a subset of B. If S C A, we denote 
by f(S) the set {f(x): « € S} of all images of elements of S. Likewise, if 
T C B, we denote by f~1(T) the set {a € A: f(x) € T} of all preimages of 
elements in T. Finally, two functions f: A — B and g: A — B are equal, 
written f = g, if f(z) = g(x) for all ae A. 


Example 1 


Suppose that A = [—10,10]. Let f: A — R be the function that assigns 
to each element x in A the element x? +1 in R; that is, f is defined by 
f(x) =x7+1. Then A is the domain of f, R is the codomain of f, and [1,101] 
is the range of f. Since f(2) = 5, the image of 2 is 5, and 2 is a preimage 
of 5. Notice that —2 is another preimage of 5. Moreover, if S = [1,2] and 
T = [82,101], then f(S) = [2,5] and f-1(T) =[-10,-9]U [9,10]. © 


As Example 1 shows, the preimage of an element in the range need not be 
unique. Functions such that each element of the range has a unique preimage 
are called one-to-one; that is f: A — B is one-to-one if f(x) = f(y) implies 
x= y or, equivalently, if x 4 y implies f(x) 4 f(y). 

If f: A > B is a function with range B, that is, if f(A) = B, then f is 
called onto. So f is onto if and only if the range of f equals the codomain 
of f. 
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Let f: A > B be a function and S C A. Then a function fs: S — B, 
called the restriction of f to S, can be formed by defining fs(a) = f(a) for 
each x € S. 

The next example illustrates these concepts. 


Example 2 


Let f: [—1,1] — [0,1] be defined by f(x) = x?. This function is onto, but 
not one-to-one since f(—1) = f(1) = 1. Note that if S = [0,1], then fg is 
both onto and one-to-one. Finally, if T = [5, 1], then fr is one-to-one, but 
not onto. 


Let A, B, and C be sets and f: A— B and g: B > C be functions. By 
following f with g, we obtain a function go f: A > C called the composite 
of g and f. Thus (go f)(x) = g(f(x)) for all 2 € A. For example, let 
A=B=C=R, f(z) = sing, and g(x) = 27 +3. Then (go f)(x) = 
(g(f(x)) = sin? x + 3, whereas (f 0 g)(x) = f(g(x)) = sin(x? + 3). Hence, 
go f # fog. Functional composition is associative, however; that is, if 
h: C > D is another function, then ho (go f) =(hog)o f. 

A function f: A — B is said to be invertible if there exists a function 
g: B = A such that (fo g)(y) = y for all y € B and (go f)(a) = = for all 
x € A. If such a function g exists, then it is unique and is called the inverse 
of f. We denote the inverse of f (when it exists) by f~'. It can be shown 
that f is invertible if and only if f is both one-to-one and onto. 


Example 3 


The function f: R — R defined by f(x) = 3x +1 is one-to-one and onto; 
hence f is invertible. The inverse of f is the function f~': R — R defined 
by f(a) =(@-/3. @ 


The following facts about invertible functions are easily proved. 


1. If f: A— B is invertible, then f~! is invertible, and (f~!)~1 = f. 
2. If f: A— Band g: B > C are invertible, then go f is invertible, and 
gop) ay eg: 


APPENDIX C_ FIELDS 


The set of real numbers is an example of an algebraic structure called a 
field. Basically, a field is a set in which four operations (called addition, 
multiplication, subtraction, and division) can be defined so that, with the 
exception of division by zero, the sum, product, difference, and quotient of 
any two elements in the set is an element of the set. More precisely, a field is 
defined as follows. 
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Definitions. A field F is a set on which two operations + and - (called 
addition and multiplication, respectively) are defined so that, for each pair 
of elements x,y in F’, there are unique elements x+y and x-y in F for which 
the following conditions hold for all elements a,b,c in F. 

(F 1) a+b=b+a and a-b=b-a 
(commutativity of addition and multiplication) 

(F 2) (a+b) +c=a+(b+c) and (a-b)-c=a-(b-c) 
(associativity of addition and multiplication) 

(F 3) There exist distinct elements 0 and 1 in F' such that 


O0O+a=a and la=a 


(existence of identity elements for addition and multiplication) 


(F 4) For each element a in F and each nonzero element b in F’, there exist 
elements c and d in F such that 


ate=0 and bd=1 


(existence of inverses for addition and multiplication) 
(F 5) a-(b+c) =ab+ac 

(distributivity of multiplication over addition) 
The elements «+ y and x-y are called the sum and product, respectively, 
of x and y. The elements 0 (read “zero”) and 1 (read “one”) mentioned in 
(F 3) are called identity elements for addition and multiplication, respec- 
tively, and the elements c and d referred to in (F 4) are called an additive 
inverse for a and a multiplicative inverse for b, respectively. 


Example 1 

The set of real numbers R with the usual definitions of addition and multi- 
plication isa field. 

Example 2 


The set of rational numbers with the usual definitions of addition and multi- 
plication isa field. 


Example 3 

The set of all real numbers of the form a+ b\/2, where a and b are rational 
numbers, with addition and multiplication as in Risa field. 

Example 4 


The field Z consists of two elements 0 and 1 with the operations of addition 
and multiplication defined by the equations 
0+0=0, 04+1=14+0=1, 14+1=0, 
0-0=0, O0-1=1-0=0, and 1-l=l1. 
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Example 5 


Neither the set of positive integers nor the set of integers with the usual 
definitions of addition and multiplication is a field, for in either case (F 4) 
does not hold. 


The identity and inverse elements guaranteed by (F 3) and (F 4) are 
unique; this is a consequence of the following theorem. 


Theorem C.1 (Cancellation Laws). For arbitrary elements a, b, and 
c in a field, the following statements are true. 
(a) Ifa+b=c+), thena=c. 
(b) Ifa-b=c-b andb40, thena=c. 


Proof. (a) The proof of (a) is left as an exercise. 

(b) If b 4 0, then (F 4) guarantees the existence of an element d in the 
field such that b-d = 1. Multiply both sides of the equality a-b = c-b by d 
to obtain (a-b)-d = (c-b)-d. Consider the left side of this equality: By (F 2) 
and (F 3), we have 


(a-b)-d=a-(b-d) =a-l=a. 
Similarly, the right side of the equality reduces to c. Thus a= c. | 


Corollary. The elements 0 and 1 mentioned in (F 3), and the elements c 
and d mentioned in (F 4), are unique. 


Proof. Suppose that 0’ € F satisfies 0’ + a = a for each a € F. Since 
0+a=<a for each a € F, we have 0’ +a =0+ a for each a € F. Thus 0’ =0 
by Theorem C.1. 

The proofs of the remaining parts are similar. | 


Thus each element 0 in a field has a unique additive inverse and, if b 4 0, 
a unique multiplicative inverse. (It is shown in the corollary to Theorem C.2 
that 0 has no multiplicative inverse.) The additive inverse and the multi- 
plicative inverse of b are denoted by —b and b~!, respectively. Note that 
—(—b) = band (0-1)-1=b. 

Subtraction and division can be defined in terms of addition and multi- 
plication by using the additive and multiplicative inverses. Specifically, sub- 
traction of b is defined to be addition of —b and division by 6 ¥ 0 is defined 
to be multiplication by b~!; that is, 


a—b=a+(—b) and 5 sab. 


1 
In particular, the symbol — denotes b~!. Division by zero is undefined, but, 


with this exception, the sum, product, difference, and quotient of any two 
elements of a field are defined. 
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Many of the familiar properties of multiplication of real numbers are true 
in any field, as the next theorem shows. 


Theorem C.2. Let a and b be arbitrary elements of a field. Then each 
of the following statements are true. 


Proof. (a) Since 0+ 0 = 0, (F 5) shows that 
0+a-0=a-0=a-(0+0) =a-0+a4-0. 


Thus 0 = a-0 by Theorem C.1. 

(b) By definition, —(a-b) is the unique element of F' with the property 
a+b + [—(a-b)] = 0. So in order to prove that (—a)-b = —(a-b), it suffices 
to show that a-b + (—a)-b = 0. But —a is the element of F’ such that 
a+ (—a) = 0; so 


a-b+ (—a)-b= [a+ (—a)]-b=0-b=b-0=0 
by (F 5) and (a). Thus (—a)-b = —(a-b). The proof that a-(—b) = —(a-b) 


is similar. 
(c) By applying (b) twice, we find that 


(—a)-(—b) = —[a-(—b)] = —[—(a-8)] = a-b. i 
Corollary. The additive identity of a field has no multiplicative inverse. 


In an arbitrary field F, it may happen that asum 1+1+--- +1 (psum- 
mands) equals 0 for some positive integer p. For example, in the field Z, 
(defined in Example 4), 1+1 = 0. In this case, the smallest positive integer p 
for which a sum of p 1’s equals 0 is called the characteristic of F’; if no such 
positive integer exists, then F’ is said to have characteristic zero. Thus Z 
has characteristic two, and R has characteristic zero. Observe that if Fis a 
field of characteristic p 4 0, then «+a+--- +a (p summands) equals 0 for all 
x € F. Ina field having nonzero characteristic (especially characteristic two), 
many unnatural problems arise. For this reason, some of the results about 
vector spaces stated in this book require that the field over which the vector 
space is defined be of characteristic zero (or, at least, of some characteristic 
other than two). 

Finally, note that in other sections of this book, the product of two ele- 
ments a and 0 in a field is usually denoted ab rather than a-b. 
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For the purposes of algebra, the field of real numbers is not sufficient, for 
there are polynomials of nonzero degree with real coefficients that have no 
zeros in the field of real numbers (for example, x? + 1). It is often desirable 
to have a field in which any polynomial of nonzero degree with coefficients 
from that field has a zero in that field. It is possible to “enlarge” the field of 
real numbers to obtain such a field. 


Definitions. A complex number is an expression of the form z = a+bi, 
where a and 6 are real numbers called the real part and the imaginary part 
of z, respectively. 

The sum and product of two complex numbers z = a+ bi and w = c+ di 
(where a, b, c, and d are real numbers) are defined, respectively, as follows: 


z+w=(at+bi)+(c+di) =(at+c)+(b+d)ji 
and 
zw = (a+ bi)(c+ di) = (ac — bd) + (bc + ad)i. 


Example 1 


The sum and product of z = 3 — 5i and w = 9+ 77 are, respectively, 
z+w = (3—5i) + (9+ 7i) = (349) + [(—5) + Ti = 124 2i 
and 


zw = (3 — 5i)(9 + Ti) = [3-9 — (—5)-7] + [(-5)-9+3-7}i = 62-247.  @ 


Any real number c may be regarded as a complex number by identifying c 
with the complex number c+ 02. Observe that this correspondence preserves 
sums and products; that is, 


(c+ 02) + (d+ 02) =(c+d)+0i and (c+ 0i)(d+ 0%) = cd+ 01. 


Any complex number of the form bi = 0 + bi, where 6 is a nonzero real 
number, is called imaginary. The product of two imaginary numbers is real 
since 


(bi) (di) = (0 + bé)(0 + di) = (0 — bd) + (b-0 + 0-d)i = —bd. 


In particular, for i= 0+ 1%, we have i-i = —1. 

The observation that i? = 1-7 = —1 provides an easy way to remember the 
definition of multiplication of complex numbers: simply multiply two complex 
numbers as you would any two algebraic expressions, and replace i? by —1. 
Example 2 illustrates this technique. 
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Example 2 
The product of —5 + 22 and 1 — 32 is 


(—5 + 2%)(1 — 31) = —5(1 — 34) + 24(1 — 31) 


II 
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1+17%. ¢ 


The real number 0, regarded as a complex number, is an additive identity 
element for the complex numbers since 


(a+ bi) +0 = (a4 bi) + (04 07) = (a +0) + (64+0)i = at i. 


Likewise the real number 1, regarded as a complex number, is a multiplicative 
identity element for the set of complex numbers since 


(a+ bi)-1 = (a + bi)(1 + 02) = (a-1 — 6-0) + (6-1 4+ a-0)i =a bi. 


Every complex number a+ bi has an additive inverse, namely (—a) + (—b)i. 
But also each complex number except 0 has a multiplicative inverse. In fact, 


eee a b 
(a + bt) =(ate) (aan) * 


In view of the preceding statements, the following result is not surprising. 


Theorem D.1. The set of complex numbers with the operations of addi- 
tion and multiplication previously defined is a field. 


Proof. Exercise. | 


Definition. The (complex) conjugate of a complex number a + bi is 
the complex number a— bi. We denote the conjugate of the complex number 
z by Z. 

Example 3 
The conjugates of —3 + 27, 4 — 77, and 6 are, respectively, 


3+2=-3-2;, F-7=44+7i, and 6=6+0i=6-0i=6. 


The next theorem contains some important properties of the conjugate of 
a complex number. 


Theorem D.2. Let z and w be complex numbers. Then the following 
statements are true. 
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) == itwH0. 
w 
is a real number if and only if Z = z. 


Proof. We leave the proofs of (a), (d), and (e) to the reader. 
(b) Let z =a+bi and w= c+ di, where a,b,c,d € R. Then 


(z+w) =(at+e)+ (b+ d)i=(a+c)—(b+d)i 
= (a—bi)+ (c—di) =7+4+. 


(c) For z and w, we have 


Zw = (a+ bi)(c+ di) = (ac — bd) + (ad + be)i 
= (ac — bd) — (ad + bc)i = (a — bi)(c — di) = Z-W. | 
For any complex number z = a + bi, zZ is real and nonnegative, for 
2% = (a+ bi)(a— bi) = a? +8. 

This fact can be used to define the absolute value of a complex number. 

Definition. Let z = a+ bi, where a,b € R. The absolute value (or 
modulus) of z is the real number Va? + b*. We denote the absolute value 
of z by |z|. 


Observe that zz = |z|?. The fact that the product of a complex number 
and its conjugate is real provides an easy method for determining the quotient 
of two complex numbers; for if c+ di 4 0, then 


at+bi atbi c—di_(actbd)+(be—ad)ji _ac+bd | be—ad. 


ct+di ctdic—di c2 + d c+ d2 | e+e” 
Example 4 


To illustrate this procedure, we compute the quotient (1 + 4i)/(3 — 2%): 


144i _1+4i 342i -54+14i 5 Ug 
5-0, 3.) $50, Uae. 3 48 


The absolute value of a complex number has the familiar properties of the 
absolute value of a real number, as the following result shows. 


Theorem D.3. Let z and w denote any two complex numbers. Then the 
following statements are true. 
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(a se [2|-|e|. 
(b = Hz ifw #0. 
|| 


(c 
(d 


ea < |z| + |v. 
z|—|wl <|z+ vu}. 


) lz 
) 
) 
) 


Proof. (a) By Theorem D.2, we have 


lew)? = (2w) (zw) = (2w)(Z-) = (22)(we) = |2/?|w)?, 


proving (a). 
(b) For the proof of (b), apply (a) to the product (=) w. 


(c) For any complex number x = a + bi, where a,b € R, observe that 
x+% = (a+ bi) + (a— bi) = 2a < 2V a? + BF = QI2I. 


Thus x + @ is real and satisfies the inequality «+ % < 2|z|. Taking x = wz, 
we have, by Theorem D.2 and (a), 


wz + Wz < 2|wz| = 2|w||z| = 2|z||w]. 


Using Theorem D.2 again gives 


lz +l? = (z+ w)(z+w) = (24+ w)(Z4- BD) = 2% +074 w+ ww 
< |z|? + 2lz|w] + |w|? = (l2] + |wl)?. 


By taking square roots, we obtain (c). 
(d) From (a) and (c), it follows that 


2] = |(2 + w) — w| < lz +0] +] -v| = [2+ | + [I- 
So 
|2| — |w| <|z+ wl, 
proving (d). | 


It is interesting as well as useful that complex numbers have both a ge- 
ometric and an algebraic representation. Suppose that z = a+ bi, where a 
and 6 are real numbers. We may represent z as a vector in the complex plane 
(see Figure D.1(a)). Notice that, as in R?, there are two axes, the real axis 
and the imaginary axis. The real and imaginary parts of z are the first and 
second coordinates, and the absolute value of z gives the length of the vector 
z. It is clear that addition of complex numbers may be represented as in R? 
using the parallelogram law. 
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Oooh z= |z|e* 
imaginary axis 
b z=at+bi x 
¢ e 
0 
—1 0 1 
0 real axis 
(a) (b) 
Figure D.1 


In Section 2.7 (p.132), we introduce Euler’s formula. The special case 
e’” = cos@ + isin is of particular interest. Because of the geometry we have 
introduced, we may represent the vector e’? as in Figure D.1(b); that is, e’” 
is the unit vector that makes an angle 6 with the positive real axis. From 
this figure, we see that any nonzero complex number z may be depicted as 
a multiple of a unit vector, namely, z = |z\e’®, where ¢ is the angle that the 
vector z makes with the positive real axis. Thus multiplication, as well as 
addition, has a simple geometric interpretation: If z = |z\e’® and w = |wle” 
are two nonzero complex numbers, then from the properties established in 
Section 2.7 and Theorem D.3, we have 


zw = |zle® - |wle™ = |zwle’Or). 
So zw is the vector whose length is the product of the lengths of z and w, 
and makes the angle 0 + w with the positive real axis. 

Our motivation for enlarging the set of real numbers to the set of complex 
numbers is to obtain a field such that every polynomial with nonzero degree 
having coefficients in that field has a zero. Our next result guarantees that 
the field of complex numbers has this property. 


Theorem D.4 (The Fundamental Theorem of Algebra). Suppose 
that p(z) = an2" + Gn_12""1 +--+ + az + ag is a polynomial in P(C) of 
degree n > 1. Then p(z) has a zero. 


The following proof is based on one in the book Principles of Mathematical 
Analysis 3d., by Walter Rudin (McGraw-Hill Higher Education, New York, 
1976). 


Proof. We want to find zo in C such that p(zg) = 0. Let m be the greatest 
lower bound of {|p(z)|: z € C}. For |z| = s > 0, we have 


[p(z)| = |anz” + OQn—12" 4 +++++ ao| 
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> |an||2"| — lan—allzl"~" = +++ = |ao| 
= |an|s” — |an—a|s"~* — +++ — [ao 
= 8"[lan| — |an—1|s~* — +++ — Jaols~]. 


Because the last expression approaches infinity as s approaches infinity, we 
may choose a closed disk D about the origin such that |p(z)| > m+ 1 if z is 
not in D. It follows that m is the greatest lower bound of {|p(z)|: z € D}. 
Because D is closed and bounded and p(z) is continuous, there exists zo in 
D such that |p(zo)| = m. We want to show that m = 0. We argue by 
contradiction. 


Assume that m # 0. Let q(z) = plz + 20) 


p(20) 
degree n, q(0) = 1, and |q(z)| > 1 for all z in C. So we may write 


. Then q(z) is a polynomial of 


g(z) =14 byz® + beg rz®tt +--+ bpz”, 


b 
where b; # 0. Because =f has modulus one, we may pick a real number @ 
k 
: b : 
such that e’*? = aut or e’*9b,, = —|b;,|. For any r > 0, we have 
k 


g re) =1 + byrbe®? + dy part ttelFtD8 4... 4 byrtein® 
= 1 |dylr® + beget tte? 4. 4 by rMein®, 
Choose r small enough so that 1 — |by|r* > 0. Then 
Ja(re’?)| <1 = [bale + bapa? +--+ + [balr” 
= 1 rf — Presale ~*~ [ale 


Now choose r even smaller, if necessary, so that the expression within the 
brackets is positive. We obtain that |q(re’’)| < 1. But this is a contradiction. 


The following important corollary is a consequence of Theorem D.4 and 
the division algorithm for polynomials (Theorem E.1). 


Corollary. If p(z) = anz” + dn_12" 1 + +++ + a1z +9 is a polynomial 
of degree n > 1 with complex coefficients, then there exist complex numbers 
C1,€2,°°*, Cn (not necessarily distinct) such that 


p(z) = an(z — €1)(2 — C2) +++ (2 — en). 
Proof. Exercise. | 


A field is called algebraically closed if it has the property that every 
polynomial of positive degree with coefficients from that field factors as a 
product of polynomials of degree 1. Thus the preceding corollary asserts that 
the field of complex numbers is algebraically closed. 
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APPENDIX E POLYNOMIALS 


In this appendix, we discuss some useful properties of the polynomials with 
coefficients from a field. For the definition of a polynomial, refer to Sec- 
tion 1.2. Throughout this appendix, we assume that all polynomials have 
coefficients from a fixed field F’. 


Definition. A polynomial f(x) divides a polynomial g(x) if there exists 
a polynomial q(x) such that g(x) = f(x)q(2). 
Our first result shows that the familiar long division process for polyno- 


mials with real coefficients is valid for polynomials with coefficients from an 
arbitrary field. 


Theorem E.1 (The Division Algorithm for Polynomials). Let 
f(a) be a polynomial of degree n, and let g(x) be a polynomial of degree 
m > 0. Then there exist unique polynomials q(x) and r(a) such that 


f(z) = a()g(a) + r(2), (1) 
where the degree of r(x) is less than m. 


Proof. We begin by establishing the existence of g(x) and r(x) that sat- 
isfy (1). 

CASE 1. Ifn<m, take g(x) = 0 and r(x) = f(x) to satisfy (1). 

CASE 2. When 0 < m < n, we apply mathematical induction on n. 
First suppose that n = 0. Then m = 0, and it follows that f(x) and g(x) 
are nonzero constants. Hence we may take q(x) = f(x)/g(x) and r(x) = 0 to 
satisfy (1). 

Now suppose that the result is valid for all polynomials with degree less 
than n for some fixed n > 0, and assume that f(x) has degree n. Suppose 
that 


f(@) = anx” + dn—12"~* +++ +012 + a9 
and 
g(x) = bm2™ + bm" + +++ + bye + bo, 
and let h(x) be the polynomial defined by 
A(x) = f(a) — andy, a" g(a). (2) 


Then h(x) is a polynomial of degree less than n, and therefore we may ap- 
ply the induction hypothesis or CASE 1 (whichever is relevant) to obtain 
polynomials q(x) and r(x) such that r(x) has degree less than m and 


h(x) = qi(a)g(a) + (a). (3) 
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Combining (2) and (3) and solving for f(x) gives us f(x) = q(x)g(x) + r(x) 
with q(x) = anb;,)2"—™ + qi(x), which establishes (a) and (b) for any n > 0 
by mathematical induction. This establishes the existence of q(x) and r(x). 

We now show the uniqueness of q(x) and r(x). Suppose that qi(x), go(x), 
ry(x), and r(x) exist such that rj(a) and r2(x) each has degree less than m 
and 


f(@) = n(2)g(2) + r1(@) = a2(w)g(@) + r2(@). 


Then 


[q(2) — q2(#)] g(a) = ra(2) — ri(@). (4) 


The right side of (4) is a polynomial of degree less than m. Since g(x) has 
degree m, it must follow that qi(a) — q2(a) is the zero polynomial. Hence 
qi(x) = go(x); thus r1(a”) = re(a) by (4). 


In the context of Theorem E.1, we call g(a) and r(x) the quotient and 
remainder, respectively, for the division of f(x) by g(a). For example, 
suppose that F' is the field of complex numbers. Then the quotient and 
remainder for the division of 


by 
g(x) = (34 i)x? — ia + 4 
are, respectively, 
g(x) = 2° +i? -—2 and r(x) =(2—3i)r4+9. 


Corollary 1. Let f(x) be a polynomial of positive degree, and let a € F. 
Then f(a) = 0 if and only if x — a divides f(x). 


Proof. Suppose that « — a divides f(x). Then there exists a polynomial 
q(x) such that f(x) = (# —a)q(x). Thus f(a) = (a — a)q(a) = 0-¢(a) = 0. 

Conversely, suppose that f(a) = 0. By the division algorithm, there exist 
polynomials q(x) and r(a) such that r(x) has degree less than one and 


f(@) = a(@) (a — a) + r(@). 


Substituting a for x in the equation above, we obtain r(a) = 0. Since r(x) 
has degree less than 1, it must be the constant polynomial r(a) = 0. Thus 


f(x) = q(2)(a — a). i 
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For any polynomial f(a) with coefficients from a field F’, an element a € F 
is called a zero of f(x) if f(a) = 0. With this terminology, the preceding 
corollary states that a is a zero of f(x) if and only if x — a divides f(z). 


Corollary 2. Any polynomial of degree n > 1 has at most n distinct 
Zeros. 


Proof. The proof is by mathematical induction on n. The result is obvious 
if mn = 1. Now suppose that the result is true for some positive integer n, and 
let f(a) be a polynomial of degree n + 1. If f(a) has no zeros, then there is 
nothing to prove. Otherwise, if a is a zero of f(x), then by Corollary 1 we 
may write f(x) = (#—a)q(x) for some polynomial g(x). Note that g(x) must 
be of degree n; therefore, by the induction hypothesis, g(x) can have at most 
n distinct zeros. Since any zero of f(x) distinct from a is also a zero of q(x), 
it follows that f(x) can have at most n+ 1 distinct zeros. 


Polynomials having no common divisors arise naturally in the study of 
canonical forms. (See Chapter 7.) 


Definition. Two nonzero polynomials are called relatively prime if no 
polynomial of positive degree divides each of them. 


For example, the polynomials with real coefficients f(a) = x?(a — 1) and 
h(x) = (a — 1)(a — 2) are not relatively prime because x — 1 divides each of 
them. On the other hand, consider f(x) and g(x) = (a — 2)(a — 3), which do 
not appear to have common factors. Could other factorizations of f(x) and 
g(x) reveal a hidden common factor? We will soon see (Theorem E.9) that 
the preceding factors are the only ones. Thus f(a) and g(x) are relatively 
prime because they have no common factors of positive degree. 


Theorem E.2. If f\(x) and f2(x) are relatively prime polynomials, there 
exist polynomials q(x) and q2(x) such that 


q(x) fi(%) + q2(2) fo(x) = 1, 
where 1 denotes the constant polynomial with value 1. 


Proof. Without loss of generality, assume that the degree of {1 (x) is greater 
than or equal to the degree of f2(a). The proof is by mathematical induction 
on the degree of f(x). If f(a) has degree 0, then f2(x) is a nonzero constant 
c. In this case, we can take qi(x) = 0 and qo(x) = 1/c. 

Now suppose that the theorem holds whenever the polynomial of lesser 
degree has degree less than n for some positive integer n, and suppose that 
fo(x) has degree n. By the division algorithm, there exist polynomials q(2) 
and r(a) such that r(x) has degree less than n and 


fila) = a(x) fo(x) + (2). (5) 
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Since f;(x) and f2(x) are relatively prime, r(x) is not the zero polynomial. We 
claim that fo(a) and r(a) are relatively prime. Suppose otherwise; then there 
exists a polynomial g(x) of positive degree that divides both fo(x) and r(z). 
Hence, by (5), g(x) also divides f(x), contradicting the fact that f(a) and 
fo(x) are relatively prime. Since r(x) has degree less than n, we may apply 
the induction hypothesis to f2(a#) and r(a). Thus there exist polynomials 
gi(x) and ge(x) such that 


g1(2) fo(@) + go(a)r(@) = 1. (6) 


Combining (5) and (6), we have 


1 = gi (2) fo(2) + go(x) [fi(@) — a(2) fo(@))] 
= g2(x) fi(@) + [91(@) — go(w)a(2)] fol). 


Thus, setting qi(v) = go(x) and g(x) = gi(x) — go(x)q(x), we obtain the 
desired result. | 


Example 1 
Let fi(z) = 2? — 2? +1 and fo(x) = (a — 1)?. As polynomials with real 
coefficients, f;(a) and fo(x) are relatively prime. It is easily verified that the 
polynomials q(x) = —a + 2 and qo(x) = 2? — x — 1 satisfy 

q(x) fi(@) + g2(x) fo(x) = 1, 


and hence these polynomials satisfy the conclusion of Theorem E.2. 


Throughout Chapters 5, 6, and 7, we consider linear operators that are 
polynomials in a particular operator T and matrices that are polynomials in a 
particular matrix A. For these operators and matrices, the following notation 
is convenient. 


Definitions. Let 
f(x) = ao t+ai(x) +--+ + ana” 


be a polynomial with coefficients from a field F. If T is a linear operator on 
a vector space V over F’, we define 


f(T) = aol +aiT +--+ +a,T”. 
Similarly, if A is an x n matrix with entries from F’, we define 


f(A) =aol +a, A+---+a,A”. 
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Example 2 


Let T be the linear operator on R? defined by T(a,b) = (2a + b,a — b), and 
let f(x) = x? + 2x — 3. It is easily checked that T?(a, b) = (5a +b, a+ 2b); so 


f(T)(a, 6) = (T? +2T = 31)(a, 6) 


= (5a + b,a + 2b) + (4a + 2b, 2a — 2b) — 3(a, bd) 
= (6a + 30, 3a — 30). 


Similarly, if 


then 


(a) = 424-31 (7 >) +2 fi 4)-3 ({ eae 2) } 


The next three results use this notation. 


Theorem E.3. Let f(x) be a polynomial with coefficients from a field F,, 
and let T be a linear operator on a vector space V over F’. Then the following 
statements are true. 

(a) f(T) is a linear operator on V. 
(b) If @ is a finite ordered basis for V and A = [T]g, then [f(T)]g = f(A). 


Proof. Exercise. i 


Theorem E.4. Let T be a linear operator on a vector space V over a 
field F, and let A be a square matrix with entries from F.. Then, for any 
polynomials f(a) and fo(a) with coefficients from F, 


(a) fi(T)fo(T) = fo(T) f(T) 
(b) fi(A) fo(A) = f(A) f(A). 


Proof. Exercise. | 


Theorem E.5. Let T be a linear operator on a vector space V over a 
field F, and let A be an n x n matrix with entries from F. If f(x) and 
fo(x) are relatively prime polynomials with entries from F’, then there exist 
polynomials q,(a“) and q2(x) with entries from F' such that 

(a) a(T) f(T) + a2(T) fo(T) =| 
(b) a1 (A) fi (A) + 92(A) fo(A) = I. 


Proof. Exercise. | 
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In Chapters 5 and 7, we are concerned with determining when a linear 
operator T on a finite-dimensional vector space can be diagonalized and with 
finding a simple (canonical) representation of T. Both of these problems are 
affected by the factorization of a certain polynomial determined by T (the 
characteristic polynomial of T). In this setting, particular types of polynomi- 
als play an important role. 


Definitions. A polynomial f(x) with coefficients from a field F is called 
monic if its leading coefficient is 1. If f(a) has positive degree and cannot be 
expressed as a product of polynomials with coefficients from F' each having 
positive degree, then f(x) is called irreducible. 


Observe that whether a polynomial is irreducible depends on the field F’ 
from which its coefficients come. For example, f(x) = x? + 1 is irreducible 
over the field of real numbers, but it is not irreducible over the field of complex 
numbers since x? + 1 = (x + i)(x — i). 

Clearly any polynomial of degree 1 is irreducible. Moreover, for polyno- 
mials with coefficients from an algebraically closed field, the polynomials of 
degree 1 are the only irreducible polynomials. 

The following facts are easily established. 


Theorem E.6. Let (x) and f(x) be polynomials. If ¢(a) is irreducible 
and $(x) does not divide f(x), then $(x) and f(x) are relatively prime. 


Proof. Exercise. | 


Theorem E.7. Any two distinct irreducible monic polynomials are rela- 
tively prime. 


Proof. Exercise. i 


Theorem E.8. Let f(x), g(x), and $(x) be polynomials. If ¢(a) is ir- 
reducible and divides the product f(x)g(x), then (a) divides f(x) or $(x) 
divides g(x). 


Proof. Suppose that $(x) does not divide f(x). Then ¢(a) and f(x) are 
relatively prime by Theorem E.6, and so there exist polynomials q,(«) and 
q2(z) such that 


1 = qi(x)O(@) + g2(x) f(2). 
Multiplying both sides of this equation by g(a) yields 
g(x) = n(x) O(x) g(x) + 92(x) f(x) 9(2). (7) 


Since $(x) divides f(a)g(x), there is a polynomial h(a) such that f(x)g(x) = 
o(x)h(x). Thus (7) becomes 


g(x) = q1(2) b(@)g(@) + a2(w) O(a) h(x) = O(a) [ar (2) g(a) + g2(2)h(@)]. 
So ¢(x) divides g(x). | 
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Corollary. Let 6(x), é1(x), 62(2),..., n(x) . irreducible monic polyno- 
mials. If ¢(a) divides the product $1(x)¢2(x)---@n(x), then o(x) = ¢;(x) 
for some i (i = 1,2,...,n). 


Proof. We prove the corollary by mathematical induction on n. For n = 1, 
the result is an immediate consequence of Theorem E.7. Suppose then that for 
some n > 1, the corollary is true for any n— 1 irreducible monic polynomials, 
and let ¢1(x), d2(),...,¢n(a) be n irreducible polynomials. If @(a) divides 


1 (x) G2(x) ++ bn(#) = [$1 (a) 2(#) +> bn-1(2)] On(2), 


then ¢(a) divides the product $1(x)¢2(x) +++ @n—1(x) or o(x) divides oe vy 
Theorem E.8. In the first case, (a) = ¢;(x) for some t (i = 1,2,.. 1) by 
the induction hypothesis; in the second case, (x) = ¢,(x) by T ae E.7. 

| 


We are now able to establish the unique factorization theorem, which is 
used throughout Chapters 5 and 7. This result states that every polynomial 
of positive degree is uniquely expressible as a constant times a product of 
irreducible monic polynomials. 


Theorem E.9 (Unique Factorization Theorem for Polynomials). 
For any polynomial f(x) of positive degree, there exist a unique constant 
c; unique distinct irreducible monic polynomials ¢,(x), ¢2(x),..., 4 (a); and 
unique positive integers n1,n2,..., Nz such that 


f(x) = eler(a)]"* [b2(@)]"? + [bx (@)]”™ 


Proof. We begin by showing the existence of such a factorization using 
mathematical induction on the degree of f(x). If f(x) is of degree 1, then 
f(a) = az +b for some constants a and 6 with a 4 0. Setting d(x) = x+6/a, 
we have f(x) = a(x). Since ¢(x) is an irreducible monic polynomial, the 
result is proved in this case. Now suppose that the conclusion is true for any 
polynomial with positive degree less than some integer n > 1, and let f(x) 
be a polynomial of degree n. Then 


f(z) = anz" +--+ a2 + a9 


for some constants a; with a, 4 0. If f(a) is irreducible, then 


An— a a 
f(2) = ay (2 Heats ces ae *) 


an an an 


is a representation of f(x) as a product of a,, and an irreducible monic poly- 
nomial. If f() is not irreducible, then f(x) = g(#)h(a) for some polynomials 
g(x) and h(x), each of positive degree less than n. The induction hypothesis 
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guarantees that both g() and h(x) factor as products of a constant and pow- 
ers of distinct irreducible monic polynomials. Consequently f(x) = g(x)h(a) 
also factors in this way. Thus, in either case, f(x) can be factored as a product 
of a constant and powers of distinct irreducible monic polynomials. 

It remains to establish the uniqueness of such a factorization. Suppose 
that 


f(@) = lgi(@)]"" [b2(@)]" + - [ox (a) 


Alea (x) [ba(w)]" »-- [hr (@) |", 


where c and d are constants, ¢;(a) and w(x) are irreducible monic polynomi- 
als, and n; and mj; are positive integers fori =1,2,...,k and 7 = 1,2,...,r. 
Clearly both c and d must be the leading coefficient of f(a); hence c = d. 
Dividing by c, we find that (8) becomes 


[pr (a)]"* [G2(a)]" --- [ba(a)]"™* = [da (@)]™ [dow] - fr (@)y™". (9) 


So ¢;(x) divides the right side of (9) for i = 1,2,...,k. Consequently, by the 
corollary to Theorem E.8, each ¢;(x) equals some w;(x), and similarly, each 
w;(x) equals some ¢;(x). We conclude that r = k and that, by renumbering 
if necessary, ¢;(x) = w;(x) fori = 1,2,...,4. Suppose that n; 4 m; for some 
i. Without loss of generality, we may suppose that i = 1 and n, > my ,. Then 
by canceling [¢1(2)]” from both sides of (9), we obtain 


[pr (a) [G2(a)]" ++ [ba(@)]™ = [2(x)]"™ «+> [Pa(w)]””*. (10) 


Since ny — m, > 0, ¢1(a) divides the left side of (10) and hence divides the 
right side also. So ¢1(x) = ¢;(x) for some i = 2,...,k by the corollary to 
Theorem E.8. But this contradicts that ¢)(x), 2(x),...,@%(#) are distinct. 
Hence the factorizations of f(x) in (8) are the same. 


(8) 


It is often useful to regard a polynomial f(x) = aya" +---+a,x+ a9 with 
coefficients from a field F' as a function f: F — F. In this case, the value of 
fatcé Fis f(c)=an,c" +---+a1c+ ao. Unfortunately, for arbitrary fields 
there is not a one-to-one correspondence between polynomials and polynomial 
functions. For example, if f(a) = x? and g(x) = x are two polynomials over 
the field Z2 (defined in Example 4 of Appendix C), then f(a) and g(a) have 
different degrees and hence are not equal as polynomials. But f(a) = g(a) for 
all a € Zp, so that f and g are equal polynomial functions. Our final result 
shows that this anomaly cannot occur over an infinite field. 


Theorem E.10. Let f(a) and g(x) be polynomials with coefficients from 
an infinite field F’. If f(a) = g(a) for alla € F, then f(x) and g(x) are equal. 


Proof. Suppose that f(a) = g(a) for all a € F. Define h(x) = f(x) — g(x), 
and suppose that h(a) is of degree n > 1. It follows from Corollary 2 to 
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Theorem E.1 that h(x) can have at most n zeroes. But 


for every a € F, contradicting the assumption that h(x) has positive degree. 
Thus h(a) is a constant polynomial, and since h(a) = 0 for each a € F, it 
follows that h(x) is the zero polynomial. Hence f(x) = g(x). | 
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to Selected Exercises 


CHAPTER 1 


SECTION 1.1 
1. Only the pairs in (b) and (c) are parallel. 
2. (a) x = (3,—-2,4) + t(-8,9,-3) (ec) x = (3,7, 2) + £(0,0, —10) 
3. (a) x = (2,—5,—-1) + s(—2,9, 7) + t(—5, 12, 2) 
(c) x = (—8,2,0) + s(9, 1,0) + t(14, —7, 0) 


SECTION 1.2 
1. (a)T (b)F (c)F (d)F (e)T (f)F 
()F (FF @T GT (&T 
3. M3 => 3, Mo. = 4, and M22 =5 


sa( it) OG 3) 


(e) 224 +43 +4+22?—22+10 (g) 10x’ — 30x* + 40x? — 15a 
13. No, (VS 4) fails. 
14. Yes 
15. No 
17. No, (VS 5) fails. 
22.27" 


SECTION 1.3 


1.(a)F (b)F (c)T (d)F (e)T ()F @)F 


2. (a) & =) the trace is —5 (c) : = i, 


1 
(e) e (s) (5 6 7) 
5 


. (a) Yes (c) Yes (e) No 
11. No, the set is not closed under addition. 


15. Yes 


Go 
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SECTION 1.4 
1. (a) T (b) F (c) T (d) F (e) T (f) F 
2. (a) {r(1,1,0,0) + s(—3, 0, —2,1) + (5,0,4,0): r,s € R} 
(c) There are no solutions. 
(e) {r(10, —3, 1, 0,0) + s(—3, 2,0,1,0) + (—4,3,0,0,5): r,s © R} 
3. (a) Yes (c) No (e) No 
4. (a) Yes (c) Yes (e) No 
5. (a) Yes (c) No (e) Yes (g) Yes 


SECTION 1.5 
1. (a) F (b) T (c) F (d) F (e) T (f) T 
2. (a) linearly dependent (c) linearly independent  (e) linearly dependent 
(g) linearly dependent (i) linearly independent 


™ {6 0) G 9)} 


11. 2” 


SECTION 1.6 
1. (a) F (b) T (c) F (d) F (e) T (f) F 
@F @T @F @T WT WT 
2. (a) Yes (c) Yes (e) No 
3. (a) No (c) Yes (e) No 
4. No 
5. No 
7. {u1, U2, Us} 
9. (a1, 2,03, 44) = ait + (a2 — a1) U2 + (a3 — a2)ug + (a4 — a3)Ua 
10. (a) —4n? —24+8 (c) —2? + 2a? + 4a —5 
13. {(1,1,1)} 
15. n?—-1 
17. 4n(n-1) 
26. n 
30. dim(W1) = 3, dim(W2) = 2, dim(W1 + We) = 4, and dim(Wi MWe) = 1 


SECTION 1.7 
1. (a)F (b)F (c)F (d)T (e)T (f)T 


CHAPTER 2 


SECTION 2.1 
1. (a) T (b) F (c) F (d) T (e) F (f) F (g) T (h) F 
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2. The nullity is 1, and the rank is 2. T is not one-to-one but is onto. 

4. The nullity is 4, and the rank is 2. T is neither one-to-one nor onto. 

5. The nullity is 0, and the rank is 3. T is one-to-one but not onto. 
10. T(2,3) = (5,11). T is one-to-one. 12. No. 


SECTION 2.2 
1. (a)T (b)T (c)F (d)T (e)T (f)F 


=A Or ea 
2.(a){3 4 (c) (2 1 -3) (dad) [-1 4 5 
1 0 is Oo a 
0 0 01 
0 0 1 0 
(ff): : 2 4 Ce) 10" Soe’ SLT) 
01 0 0 
1 0 0 0 
3 ad 3 
3. [T]} = 0 1 and [T]% = 2 3 
2 4g 2 4 
3 3 3 
100 0 01 0 1 
00 1 0 222 -2 
5 (@) 19 1 0 0 ) 10 0 0 (eye 
Othe of 0 0 2 4 
ie eg 0 
011 0 
001 0 
10. 
0 0 0 1 
00 0 1 
SECTION 2.3 


1. (a)F (b)T (c)F (d)T (e)F (f)F 
(g)F (bh) F oe (j) T 


2. (a) A(2B +3C) = 4 .) andl ABD) = (_ 36) 


(b) A'B 


23 19 O 
26 —1 

0 

6 

4 
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12. (a) No. (b) No. 


SECTION 2.4 


1. (a) F (b) T (c) F (d) F (e) T (f) F 
(g)T (h)T GWT 

2. (a) No (b) No (c) Yes (d) No (e) No (f) Yes 

3. (a) No (b) Yes (c) Yes (d) No 


1 0 0 0 
0 0 1 0 
TOA arg se 
0 0 0 1 


SECTION 2.5 
1. (a)F (b)T (c)T (d)F_ (e)T 


SOG n) OG 


a2 b2 C2 0 -1 0 5 —6 3 
3. (a) a1 by C1 (c) 1 0 (e) 0 4 —1 
ap bo co -3 1 3 -1 2 


7. (a) T(t,y) = 5 


SECTION 2.6 

1. (a) F (b) T (c) T (d) T (e) F (f) T (g) T (h) F 
. The functions in (a), (c), (e), and (f) are linear functionals. 
. (a) fi(z,y, 2) =a -— 5Y, fo(x,y,z) = SY; and f3(x,y,z) = —x%+z 
. The basis for V is {pi(x), p2(x)}, where pi(x) = 2 — 2x and p2(x) = —5 +2. 
. (a) T'(f) =g, where g(a + br) = —3a — 4b 


Orme=(S 4) wre 4.) 


N 8 WOW WN 
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SECTION 2.7 


1. (a) T (b) T (c) F (d) F (e) T (f) F (g) T 
2. (a) F (b) F (c) T (d) T (e) F 

3. (a) {e-',te*} (c) {e-*, te‘, e",te’} (e) {e7*, €* cos 2t, e sin 2t} 
A. (a) fel+v5)t/2 o(1—v5)t/24 (c) Live=t oe 2 


CHAPTER 3 


SECTION 3.1 


1. (a)T (b)F (c)T (d)F (e)T (f)F 
(g)T (h)F (i)T 


2. Adding —2 times column 1 to column 2 transforms A into B. 


00 1 1 0 0 
3. (a) f 1 7 (c) (: 1 7 
ik Oe 50 28 


SECTION 3.2 


1. (a)F (b)F (c)T (d)T (e)F (f)T 
(g)T (h)T Gi)T 


-(a)2— (ec) (e)3 (g) 1 


2 
1 0 0 0 
(a) |0 1 O OJ; the rank is 2. 
0 0 0 0 


. (a) The rank is 2, and the inverse is ( 


S) 


> 


—1 2 
1 


(c) The rank is 2, and so no inverse exists. 


1! 


| 
an 
N= 


Pad 
(e) The rank is 3, and the inverse is $ 0 -$ 
i i i 
6 3 2 
—51 15 te AD. 
, P : 31 -9 -4 —-7 
(g) The rank if 4, and the inverse is 10 3 1 3 
—3 1 1 1 


6. (a) T7 (ax? + ba +c) = —ax? — (4a + b)x — (10a + 26+ c) 
1 
2 


a 
1 0 O\ /1 0 0 
7 {0 1 0 1 1 0 
1 0 1/7 \0 0 1 
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1 
2 
20. (a) 1 
0 
0 
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ooooo 
ooooo 
O'S OOO: 


SECTION 3.3 
1. (a)F  (b)F (ce) T i ()F ()F  (g)T 


7. 
11. 


o(} © 


} 
1 
CH | 
—1 —3 1 
0 i —1 
1 0 1 
2 = 
:tEeR of (2) +4( i) rel 
1 1 


= 
o} 
— 
LO 
| 
oor w 
Oro w 


—= 
feb) 
wm 
aN 

oO ou 
Se 
a 
aie 
= 


1 —2 3 -1 
0 1 0 0 
0 en 0 1 +t 0 Tr,s,tER 
0 0 0 1 
0 —3 1 
0 1 1 
(g) oj tr 1 0 :7,8,ER 
1 0 1 
3 0 3 Ly 3 
aus 22) -alelei-o 
—~4 2 _1 x3 —2 
9 3 9 


2 1 
T~'{(1,11)} = saat (-) teR 


The systems in parts (b), (c), and (d) have solutions. 


(h) F 


The farmer, tailor, and carpenter must have incomes in the proportions 4 : 3: 4. 


13. There must be 7.8 units of the first commodity and 9.5 units of the second. 


SECTION 3.4 


1. 


(JF (b)T (e)T (d)T (e)F (f)T (g)T 
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Z 2 4 4 1 
3 0 1 0 
2. (a) | -3 (c) (e) +r +s :7,sER 
(3) = 1 0 2 
-1 0 0 1 
—23 1 —23 
0 1 0 
(g) 7T/+r]O]+s 6]: 7,6ER 
9 0 9 
0 1 


+t 


1 0 2 1 4 
5. 1 1 33 2 7 
3 ly, al 0 -9 


7. {u1, U2, us} 
11. (b) {(1, 2, 1,0, 0), (2, 1,0, 0, 0), (1,0, 0, 1,0), (—2, 0, 0,0, 1)} 
13. (b) {(1,0, 1, 1, 1,0), (0, 2,1, 1,0, 0), (1, 1, 1,0, 0, 0), (-3, —2,0, 0, 0, 1)} 


(c) There are no solutions. 


Oo Oo OF AA 
| 
NFR RB 
Pe er: 
os 
Mm 
ay 


CHAPTER 4 


SECTION 4.1 
1. (a) F (b) T (c) F (d) F (e) T 
2. (a) 30 (c) —8 
3. (a) —104 15% (c) —24 
A. (a) 19 (c) 14 
SECTION 4.2 
L@F @)T @T @T ©F (HF @&F WT 
3. 42 5. —12 7. —12 9. 22 11. —3 
13. —8 15. 0 17. —49 19. —28—1 21. 95 
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SECTION 4.3 
1. (a) F (b) T (c) F (d) T (e) F (f) T (g) F (h) F 
3. (4,—3, 0) 5. (—20, —48, —8) 7. (0, —12, 16) 

24. t? +an—-1t™ 1 +--+. tait+ao 


10 O 0 

26. (a) ( fe a) (c) (: —20 7 
44121 11 

0 oO -8 


~3i 0 0 18 28 -6 
(e) 4 te. “0 (g) |-20 -21 37 
10+16i —5—3i 343% 48 14 —16 


SECTION 4.4 
1. (a) T (b) T (c) T (d) F (e) F (f) T 
(ig)T ()F @T GT (k)T 
2. (a) 22 (c) 2— 4% 
3. (a) —12 (c) —12 (e) 22 (g) -3 
4. (a) 0 (c) —49 (e) —28 —i (g) 95 


SECTION 4.5 
1. (a) F (b) T (c) T (d) F (e) F (f) T 
3. No 5. Yes 7. Yes 9. No 


CHAPTER 5 


SECTION 5.1 


1. (a)F (b)T (c)T (d)F (e)F (f)F 
(g)F (h)T GT GF (K)F 


-1 0 0 
2. (a) Mo=(_7 G) 2 ©) m= ( 01 | yes 


-1 1 0 0 
0 -l 1 0 

() Me=| 9 9 +1 of ™ 
0 0 0 —-1l 


3. (a) The eigenvalues are 4 and —1, a basis of eigenvectors is 


Uae eye als eee Peg a) 


(c) The eigenvalues are 1 and —1, a basis of eigenvectors is 


{ats Cats) a= Gts at) mt r= (6 1). 
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4. (a) \=3,4 GB = {(3,5), (1, 2)} 
(b) \=—-1,1,2 6 = {(1, 2,0), (1, -1,—1), (2,0,-1)} 
(f) \=1,3 G={-24+2,-44+27,-8+23,2} 
-1 0 1 1 0 0 0 


{ 
(ENS no pees eee p={ 


2 0 1 0 0 1 1 O 
wrens fC 6 MC MGI) 
26. 4 
SECTION 5.2 


1. (a)F (b)F (c)F (d)T (e)T (f)F 
(g)T (h)T (iF 


2. (a) Not diagonalizable (c)Q= G = 


—1 0 -1 
3. (a) Not diagonalizable (c) Not diagonalizable 


1 1 1 
(e) Not diagonalizable (g) Q= 2 -1 7 


(d) B= {x a x eae y a (e) B= {(, 1), (1,-1)} 


Pfs Eee 2G) = 34)" 
= 3 e (1) Sy "4 a) 


14. (b) 2(t) = cre (1) i Fat 1) 


1 
(c) x(t) =e" | (") + C2 
0 


SECTION 5.3 
1. (a)T (b)T (c)F (d)F (e)T (f)T 
(g)T (h)F GF G)T 


2. (a) () 5) © 


-1 0 -1l 
(g) |-4 1 -2 (i) No limit exists. 


a 
18 (e) No limit exists. 
6 
13 


Sle als 


2 0 2 


6. One month after arrival, 25% of the patients have recovered, 20% are ambu- 
latory, 41% are bedridden, and 14% have died. Eventually #2 recover and % 


die. 


580 Answers to Selected Exercises 


7. 2. 
7 
8. Only the matrices in (a) and (b) are regular transition matrices. 
a ee 
3° 33), <3: 
9. (a)/i 1 21 (c) No limit exists. 
3 3 43 
eo. 
BB 8: 
0 0 0 0 
0 0 0 
0 0 0 0 
(e) J 2 1 0 (g) 
5 1 1 41 9 
‘ oo 2 
2 132: 
0.225 0.20 
10. (a) | 0.441] after two stages and | 0.60 }] eventually 
0.334 0.20 
0.372 0.50 
(c) | 0.225 | after two stages and | 0.20 | eventually 
0.403 0.30 
1 
0.329 3 
(e) | 0.334 | after two stages and | 1 | eventually 
0.337 
1 
3 


9 6 4 . 
12. 75 new, 75 once-used, and 75 twice-used 


13. In 1995, 24% will own large cars, 34% will own intermediate-sized cars, and 
42% will own small cars; the corresponding eventual percentages are 10%, 30%, 
and 60%. 


20. e? =I and e! =el. 


SECTION 5.4 
L(@F ()T (@)F @F ©T (HT (TT 
2. The subspaces in (a), (c), and (d) are T-invariant. 
1 1 a 


eo@il(f(alt of 9} 
0 1 2 
9. (a) —t(t? — 3t +3) (c) 1-t 


10. (a) t(t — 1)(#? — 3t +3) (c) (¢— 1)? (t+ 1) 


ay ee 
18. (c) At = 5{9 1 8 
0 oO -2 
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31. (a) #? —6t+6 (c) —(¢ + 1)(t? — 6t + 6) 


CHAPTER 6 


SECTION 6.1 
1(a)T (b)T (c)F (dF (e)F (fF @)F (yt 
2. (x,y) =8+5i, lle\] = v7, llyll = V14, and lle + yl] = v37. 


: rl 11 + 3e? 
3. (f,9) = 1 Ifll=-@, llgll =f —. and lif + all =f ——- 
16. (b) No 
SECTION 6.2 


L(a)F (b)T ()T (4)F (e)T (f)F (g)T 


2. For each part the orthonormal basis and the Fourier coefficients are given. 
(b) {2(1,1, 0), (-2,1,0),-20,-1,)}; 248, -¥8, Y. 
(c) {1,2V3(« — 4), 6V5(a?7-2+})}; 3, 4,0. 
(e) {3(2,-1,-2,4), Zag(—4,2,-3,1), Gieg(-3,4,9,7)$; 10, 3V30, VIBE 


it 


(k) { 1_(—4,3 — 2i,i,1 — 4i), 4. (3 — i, -5i, -2 + 41,2 +4), 


Gigg(-17 — i, -9 + 8i, -18 + 63, —9 4 3i)}; 


V47(—1 — i), V60(—1 4 2%), V/1160(1 + 4) 
ae ae 2 a 1 -4 —11—9i 
om) { (I ‘cae Ga e 1-i i" 
1 —5-—118 —7—-26i\). ; ied 
saan ( ~145i 58 yi VI82+4), 246(—1—4), 0 
4. S~ = span({(i, —$(1 + 7%), 1)}) 


5. Se is the plane through the origin that is perpendicular to x9; S+ is the line 
through the origin that is perpendicular to the plane containing x1 and 2. 


29 
1 26 1 
ww 2(2) wd (*) 


20. (b) 
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SECTION 6.3 
L(aT ()F ()F @T ©@F ()T (eT 
2. (a) y = (1, —2,4) (c) y = 210x? — 204 + 33 
3. (a) T*(x) = (11, -12) (c) T*(f(t)) = 12 + 6t 
14. T*(x) = (2, z)y 
20. (a) The linear function is y = —2t+ 5/2 with E = 1, and the quadratic 
function is y = t?/3 — 4t/3 + 2 with E =0. 
(b) The linear function is y = 1.25¢ + 0.55 with E = 0.3, and the quadratic 
function is t?/56 + 15t/14 + 239/280 with EB = 0.22857 (approximation). 


21. The spring constant is approximately 2.1. 


22. (a) c=2,y=2,2=34 (dq) c=S,y=$.2=4,v=-H 


SECTION 6.4 
L(a)T (b)F ()F (dT (@T (fT FEF (T 


2. (a) T is self-adjoint. An orthonormal basis of eigenvectors is 


eat —2), Fle y}, with corresponding eigenvalues 6 and 1. 
(c) T is normal, but not self-adjoint. An orthonormal basis of eigenvectors 
is 


{50 +i, V2), = (1 + 4, va} with corresponding eigenvalues 


Bip Casa 


v2 V2 


(e) T is self-adjoint. An orthonormal basis of eigenvectors is 


tw o)-v0 1) al oC 1} 


with corresponding eigenvalues 1, 1, —1, —1. 


SECTION 6.5 
1. (a)T (b)F (c)F (d)T (e)F (f)T 
(g)F (h)F (i)F 


2. @) P= (5 ce and p= (3 a) 


1 
Vi 2 00 
(d) P=} —4, and D={ 0 -2 0 
; 0 04 


gle Se SI 
gh Sk Sh 


4. Tz is normal for all z € C, Tz is self-adjoint if and only if z € R, and T, is 
unitary if and only if |z| = 1. 


5. Only the pair of matrices in (d) are unitarily equivalent. 
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25. 2 - 4) 
? 
26. (a)v-S (by w+ $ 
1 1 1 1 
27. (a) x= —-2'+-~y' and = —=a7' — —~y/’ 
(a) x ee get Pt Be 
The new quadratic form is 3(x’)? — (y’)?. 
3 , 2 ! —2 ! 2 U 
c) © = =a’ + and = 2" + 
(c) JB VIB" ete 13? 
The new quadratic form is 5(2’)? — 8(y’)?. 
iW V2 V2 2w? 
= 1 1 V6 = V3 
29. (c) P a wa ; and R 0 v3 
o + *¥ G>- ig” ae 
(e) x1 = 3, re 5, 73 =4 
SECTION 6.6 
1. (a) F (b) T (c) T (d) F (e) F 
1 2 
2. For W = span({(1,2)}), [T]¢ = : : ; 
5 OS 
3. (2) (a) Ti(a,b) = $(a+b,a +b) and T2(a,b) = $(a — b, —a + b) 
(d) Ti(a,b,c) = $(2a — b—c, —a + 2b— cc, -—a— b+ 2c) and 
T2(a,b,c) = ¢(at+b+c,atb+c,at+bt+c) 


SECTION 6.7 
L(a)F (b)F (c)T (dT ()F ()F @)T 


=e (iaa() wage(thoe a (tee (2) 


Gis V3, 02> V2 
1, 1 
(c) ui = ao v2 = Gene v3 = r= 
cosx + 2sinzx 2cosx — sinx 1 
w= Jan ,u2 = Jan , U3 = Word 


o1 = V5, 02 = V5, 03 = 2 
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27 0 1 ES 
Ke a Se - Ee v5 0 1 cl 
(c) TO V2 TO 0 1 & 2) 
sll. =o 0 —-—~ 0 0 fb, wl 
10 V2 10 V2 V2 
“. ee i 0 0 
10 2 10 
1+i 1+i 2 da} 
e ( 2 my ie ) ve V6 
1-i —1 Hi 0 0 Iti 
2 2 V6 


Pe eb se — 2 
5. (a) T(e,y,2) = (= * 2 *) 


(c) T'(a+ bsinz + ccosx) = T~'(a+ bsinx + ccosx) = 
a, (2b+¢)sina + (—b + 2c) cosa 
20 5 
1/1 1 -1 1/1 -—-2 3 1 1fl—i 1+i 
6) 5 (; 1 a © 5(; BF 28 ;) (ty = 
7. (a) Zi =N(T)+ =R? and Z2 =R(T) = span{(1, 1,1), (0,1,—1)} 
(c)Z,=N(T)t =V and Z.=R(T)=V 
8. (a) No solution z G) 


SECTION 6.8 
1. (a) F (b) F (c) T (d) F (e) T (f) F 
(g)F (h)F GT G)F 
4. (a) Yes (b) No (c) No (d) Yes (e) Yes (£) No 


ne oe ie aay tee 00 00 

000 0 Se a ar 

sy (2 0 3] ) 19 0 0 0 (1 oo 0 0 

{Gee td 2 0 -8 0 

1 1 

2 1 Va 0 Va 

17. (a) and (b) 4 |}, ( (slo |.fil.]o 
uf 2 

V5 v5 us 0 neil 

Va Va 


18. Same as Exercise 17(c) 


22. a) a= (j = and Da(G os 
wmo-(' z and v= (’ 2) 


00 1 =i 
(c)Q={0 1 -0.25] and D=|{ 0 
i 0 2 0 


or OO 


aoe 
NI 
nt 
Se 
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SECTION 6.9 
2b ig a. eee 
Vraat ae 
ret i > oo. ae 
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SECTION 6.10 


1. (a)F (b)T (c)T (d)F (e)F 
2. (a) /18 (c) approximately 2.34 


4. (a) ||Al| © 84.74, ||A7*|| © 17.01, and cond(A) = 1441 
(b) ||@— A7d|| < || At] - || A& — b]| & 0.17 and 


\|@ — A~*O|| |b — AZ|] _ 14.41 
————_—" < cond(A) ~} 
|| A~*9|| I!>I| I! 
5. 0.001 < @=#l < 49 


eal 


1 
6. R (-) = >, (Bi ond cond ay S 


SECTION 6.11 


1. (a)F (b)T (c)T (d)F (e)T (f)F 
(g)F (h)F GT GF 


3. 0) {o(%3): ven} 
4 


. (b) UGE rer} if 9 =Oand {4 (“S 51) rer} if 640 


7. (c) There are six possibilities: 
(1) Any line through the origin if ¢ =» =0 


0 
(2) {(") : rel if@=Oand ~=7 


cosw +1 
—sinw |: teER ifo¢=aandwAr 


a fe("s 
wf faba if~=nanddFsfn 
sin d 
0 

a (:} ten] ifg¢=yp=7 

0 


585 


586 Answers to Selected Exercises 


sin ¢(cos w + 1) 
(6) ¢t| —sindsiny >: tER otherwise 
sin w(cos ¢ + 1) 


CHAPTER 7 


SECTION 7.1 
1. (a) T (b) F (c) F (d) T (e) F (f) F (g) T (h) T 


roners (3-0) 669 

meres) Sera) sae 
3) 

om=i{6 3-036 ).6 9} 


SECTION 7.2 
L(a)T (b)T ()F (@)T ()T ()F @)F (hyT 
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2 
3. (a) For \ = 2, {2, 22, x7} v(t 
0 
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CORE 
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210000 
A, 0 O eres: 
2,J=|]O0 Ar O where A; = ; 
Oy) ak 0002 10 
3 00002 0 
000 0 02 
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04 1 0 ae 110 
came mere? ie) A aire cE 
000 4 
M = 2 =3 
5 2 © 
3. (a) —(t— 2)’(t—3) (b) ae Cae 


(c) A2=3 (d) pi =3 and pp = 1 
(e) (i) rank(Ui) = = 3 and rank(Us) 
(ii) rank(U?) = 1 and rank(U3) 
(iii) nullity(U1) = = 2 and nullity 
(iv) nullity(U?) = 4 and nullity 
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1 0 0 1 1 1 
4.(a)J={0 2 1] and Q=[2 1 2 
0 0 2 1 -1 0 
0 1 0 0 1 0 1 -1 
0 0 0 0 1 -1 0O 1 
9%, 2¢ |) PPARs oto: 4 
0 0 0 2 1 0 1 0 
1 1 0 0 
_fo 110 ns pada ghee 
5. (a) J= 00 10 and {= {2e', 2te’,t“e’,e~"} 
0 0 0 2 
2 1 0 0 
_{o 200 _ ee 
(c) J= 0 Oo 4 and (= {6z,x°,2,a*} 
0 0 0 2 
2 1 0 0 
0 2 1 «0 
(d) J= 0020 and 
0 0 0 4 
fA (° : : ai io 
- 0 : ; 2 0 
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i) 

— 
Cae. < 
Sao 


3) il 
3) *\2 
0 1 
(c1 a cot) +e.}1 + c3e 1 
0 —1 
x 1 0 0 
(b) y (c1 + cot + c3t”) Oy} + (c2 + 2cst) 1] +2c3 | 0 
z 0 0 1 
SECTION 7.3 


1. (a)F (b)T (c)F (d)F (e)T (f)F 
(g)F (h)T (i) T 


eS 


2. (a) (t—1)(t— 3) (c) @—1)°(¢-2) (d) (¢ — 2)? 

3. (a) t? —2 (c) (t-— 2)? (d) (€—1)(¢+1) 

A. For (2), (a); for (3), (a) and (d) 

5. The operators are To, |, and all operators having both 0 and 1 as eigenvalues. 
SECTION 7.4 

L(a)T (b)F ()F (d)T (e)T (f)F (g)T 
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2. (a) {1 0 —27 (b) 
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Absolute value of a complex num- 
ber, 558 
Absorbing Markov chain, 304 
Absorbing state, 304 
Addition 
of matrices, 9 
Addition of vectors, 6 
Additive function, 78 
Additive inverse 
of an element of a field, 553 
of a vector, 12 
Adjoint 
of a linear operator, 358-360 
of a linear transformation, 367 
of a matrix, 331, 359-360 
uniqueness, 358 
Algebraic multiplicity of an eigen- 
value, see Multiplicity of an 
eigenvalue 
Algebraically closed field, 482, 561 
Alternating n-linear function, 239 
Angle between two vectors, 202, 
335 
Annihilator 
of a subset, 126 
of a vector, 524, 528 
Approximation property of an or- 
thogonal projection, 399 
Area of a parallelogram, 204 
Associated quadratic form, 389 
Augmented matrix, 161, 174 
Auxiliary polynomial, 131, 134, 137— 
140 
Axioms of the special theory of 
relativity, 453 
Axis of rotation, 473 


Back substitution, 186 
Backward pass, 186 
Basis, 43-49, 60-61, 192-194 
cyclic, 526 
dual, 120 


Jordan canonical, 483 
ordered, 79 
orthonormal, 341, 346-347, 372 
rational canonical, 526 
standard basis for F”, 43 
standard basis for Pn (F’), 43 
standard ordered basis for F”, 
79 
standard ordered basis for P,(F’), 
79 

uniqueness of size, 46 

Bessel’s inequality, 355 

Bilinear form, 422-433 
diagonalizable, 428 
diagonalization, 428-435 
index, 444 
invariants, 444 
matrix representation, 424-428 
product with a scalar, 423 
rank, 443 
signature, 444 
sum, 423 
symmetric, 428-430, 433-435 
vector space, 424 


Cancellation law for vector addi- 
tion, 11 
Cancellation laws for a field, 554 
Canonical form 
Jordan, 483-516 
rational, 526-548 
for a symmetric matrix, 446 
Cauchy—Schwarz inequality, 333 
Cayley—Hamilton theorem 
for a linear operator, 317 
for a matrix, 318, 377 
Chain of sets, 59 
Change of coordinate matrix, 112— 
115 
Characteristic of a field, 23, 41, 
42, 430, 449, 555 
Characteristic polynomial, 373 
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of a linear operator, 249 
of a matrix, 248 
Characteristic value, see Eigenvalue 
Characteristic vector, see Eigen- 
vector 
Classical adjoint 
of an n X n matrix, 231 
of a 2 x 2 matrix, 208 
Clique, 94, 98 
Closed model of a simple econ- 
omy, 176-178 
Closure 
under addition, 17 
under scalar multiplication, 17 
Codomain, 551 
Coefficient matrix of a system of 
linear equations, 169 
Coefficients 
Fourier, 119 
of a differential equation, 128 
of a linear combination, 24, 43 
of a polynomial, 9 
Cofactor, 210, 232 
Cofactor expansion, 210, 215, 232 
Column of a matrix, 8 
Column operation, 148 
Column sum of matrices, 295 
Column vector, 8 
Companion matrix, 526 
Complex number, 556-561 
absolute value, 558 
conjugate, 557 
fundamental theorem of alge- 
bra, 482, 560 
imaginary part, 556 
real part, 556 
Composition 
of functions, 552 
of linear transformations, 86— 
89 
Condition number, 469 
Conditioning of a system of linear 
equations, 464 
Congruent matrices, 426, 445, 451 
Conic sections, 388-392 
Conjugate linear property, 333 
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Conjugate of a complex number, 
557 
Conjugate transpose of a matrix, 
331, 359-360 
Consistent system of linear equa- 
tions, 169 
Consumption matrix, 177 
Convergence of matrices, 284-288 
Coordinate function, 119-120 
Coordinate system 
left-handed, 203 
right-handed, 202 
Coordinate vector, 80, 91, 110- 
111 
Corresponding homogeneous sys- 
tem of linear equations, 172 
Coset, 23, 109 
Cramer’s rule, 224 
Critical point, 439 
Cullen, Charles G., 470 
Cycle of generalized eigenvectors, 
488-491 
end vector, 488 
initial vector, 488 
length, 488 
Cyclic basis, 526 
Cyclic subspace, 313-317 


Degree of a polynomial, 10 
Determinant, 199-243 
area of a parallelogram, 204 
characterization of, 242 
cofactor expansion, 210, 215, 232 
Cramer’s rule, 224 
of an identity matrix, 212 
of an invertible matrix, 223 
of a linear operator, 258, 474, 
476-477 
of a matrix transpose, 224 
of an n X n matrix, 210, 232 
n-dimensional volume, 226 
properties of, 234-236 
of a square matrix, 367, 394 
of a 2 x 2 matrix, 200 
uniqueness of, 242 
of an upper triangular matrix, 
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Index 


volume of a parallelepiped, 226 
Wronskian, 232 
Diagonal entries of a matrix, 8 
Diagonal matrix, 18, 97 
Diagonalizable bilinear form, 428 
Diagonalizable linear operator, 245 
Diagonalizable matrix, 246 
Diagonalization 
of a bilinear form, 428-435 
problem, 245 
simultaneous, 282, 325, 327, 376, 
405 
of asymmetric matrix, 431-433 
test, 269, 496 
Diagonalize, 247 
Differentiable function, 129 
Differential equation, 128 
auxiliary polynomial, 131, 134, 
137-140 
coefficients, 128 
homogeneous, 128, 137-140, 523 
linear, 128 
nonhomogeneous, 142 
order, 129 
solution, 129 
solution space, 132, 137-140 
system, 273, 516 
Differential operator, 131 
null space, 134-137 
order, 131, 135 
Dimension, 47—48, 50-51, 103, 119, 
425 
Dimension theorem, 70 
Direct sum 
of matrices, 320-321, 496, 545 
of subspaces, 22, 58, 98, 275- 
279, 318, 355, 366, 394, 398, 
401, 475-478, 494, 545 
Disjoint sets, 550 
Distance, 340 
Division algorithm for polynomi- 
als, 562 
Domain, 551 
Dominance relation, 95-96, 99 
Dot diagram 
of a Jordan canonical form, 498— 
500 
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of a rational canonical form, 535— 
539 
Double dual, 120, 123 
Dual basis, 120 
Dual space, 119-123 


Economics, see Leontief, Wassily 
Eigenspace 
generalized, 485-491 
of a linear operator or matrix, 
264 
Eigenvalue 
of a generalized eigenvector, 484 
of a linear operator or matrix, 
246, 371-374, 467-470 
multiplicity, 263 
Eigenvector 
generalized, 484-491 
of a linear operator or matrix, 
246, 371-3874 
Einstein, Albert, see Special the- 
ory of relativity 
Element, 549 
Elementary column operation, 148, 
153 
Elementary divisor 
of a linear operator, 539 
of a matrix, 541 
Elementary matrix, 149-150, 159 
Elementary operation, 148 
Elementary row operation, 148, 153, 
217 
Ellipse, see Conic sections 
Empty set, 549 
End vector of a cycle of general- 
ized eigenvectors, 488 
Entry of a matrix, 8 
Equality 
of functions, 9, 551 
of matrices, 9 
of n-tuples, 8 
of polynomials, 10 
of sets, 549 
Equilibrium condition for a sim- 
ple economy, 177 
Equivalence relation, 107, 551 
congruence, 449, 451 
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unitary equivalence, 394, 472 

Equivalent systems of linear equa- 
tions, 182-183 

Euclidean norm of a matrix, 467— 
470 

Euler’s formula, 132 

Even function, 15, 21, 355 

Exponential function, 133-140 

Exponential of a matrix, 312, 515 

Extremum, see Local extremum 


Field, 553-555 
algebraically closed, 482, 561 
cancellation laws, 554 
characteristic, 23, 41, 42, 430, 

449, 555 

of complex numbers, 556-561 
product of elements, 553 
of real numbers, 549 
sum of elements, 553 

Field of scalars, 6-7, 47 


Finite-dimensional vector space, 46— 


51 

Fixed probability vector, 301 
Forward pass, 186 
Fourier, Jean Baptiste, 348 
Fourier coefficients, 119, 348, 400 
Frobenius inner product, 332 
Function, 551-552 

additive, 78 

alternating n-linear, 239 

codomain of, 551 

composite, 552 

coordinate function, 119-120 

differentiable, 129 

domain of, 551 

equality of, 9, 551 

even, 15, 21, 355 

exponential, 133-140 

image of, 551 

imaginary part of, 129 

inverse, 552 

invertible, 552 

linear, see Linear transforma- 

tion 
n-linear, 238-242 
norm, 339 


Index 


odd, 21, 355 

one-to-one, 551 

onto, 551 

polynomial, 10, 51-53, 569 

preimage of, 551 

range of, 551 

real part of, 129 

restriction of, 552 

sum of, 9 

vector space, 9 

Fundamental theorem of algebra, 

482, 560 


Gaussian elimination, 186-187 
back substitution, 186 
backward pass, 186 
forward pass, 186 

General solution of a system of 

linear equations, 189 

Generalized eigenspace, 485-491 

Generalized eigenvector, 484-491 

Generates, 30 

Generator of a cyclic subspace, 313 

Geometry, 385, 392, 436, 472-478 

Gerschgorin’s disk theorem, 296 

Gram-Schmidt process, 344, 396 

Gramian matrix, 376 


Hardy—Weinberg law, 307 

Hermitian operator or matrix, see 
Self-adjoint linear operator 
or matrix 

Hessian matrix, 440 

Homogeneous linear differential equa- 
tion, 128, 187-140, 523 

Homogeneous polynomial of de- 
gree two, 433 

Homogeneous system of linear equa- 
tions, 171 

Hooke’s law, 128, 368 

Householder operator, 397 


Identity element 

in C, 557 

in a field, 553, 554 
Identity matrix, 89, 93, 212 
Identity transformation, 67 
Ill-conditioned system, 464 
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Image, see Range 
Image of an element, 551 
Imaginary number, 556 
Imaginary part 
of a complex number, 556 
of a function, 129 
Incidence matrix, 94-96, 98 
Inconsistent system of linear equa- 
tions, 169 
Index 
of a bilinear form, 444 
of a matrix, 445 
Infinite-dimensional vector space, 
AT 
Initial probability vector, 292 
Initial vector of a cycle of gener- 
alized eigenvectors, 488 
Inner product, 329-336 
Frobenius, 332 
on H, 335 
standard, 330 
Inner product space 
complex, 332 
H, 332, 343, 348-349, 380, 399 
real, 332 
Input—output matrix, 177 
Intersection of sets, 550 
Invariant subspace, 77-78, 313- 
315 
Invariants 
of a bilinear form, 444 
of a matrix, 445 
Inverse 
of a function, 552 
of a linear transformation, 99— 


102, 164-165 
of a matrix, 100-102, 107, 161— 
164 


Invertible function, 552 
Invertible linear transformation, 99— 


102 

Invertible matrix, 100-102, 111, 
223, 469 

Irreducible polynomial, 525, 567— 
569 


Isometry, 379 
Isomorphic vector spaces, 102-105 
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Isomorphism, 102-105, 123, 425 


Jordan block, 483 
Jordan canonical basis, 483 
Jordan canonical form 
dot diagram, 498-500 
of a linear operator, 483-516 
of a matrix, 491 
uniqueness, 500 


Kernel, see Null space 
Kronecker delta, 89, 335 


Lagrange interpolation formula, 51— 
53, 125, 402 
Lagrange polynomials, 51, 109, 125 
Least squares approximation, 360— 
364 
Least squares line, 361 
Left shift operator, 76 
Left-handed coordinate system, 203 
Left-multiplication transformation, 
92-94 
Legendre polynomials, 346 
Length of a cycle of generalized 
eigenvectors, 488 
Length of a vector, see Norm 
Leontief 
closed model, 176-178 
open model, 178-179 
Leontief, Wassily, 176 
Light second, 452 
Limit of a sequence of matrices, 
284-288 
Linear combination, 24—26, 28-30, 
39 
uniqueness of coefficients, 43 
Linear dependence, 36-40 
Linear differential equation, 128 
Linear equations, see System of 
linear equations 
Linear functional, 119 
Linear independence, 37-40, 59— 
61, 342 
Linear operator, (see also Linear 
transformation), 112 
adjoint, 358-360 
characteristic polynomial, 249 
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determinant, 258, 474, 476-477 

diagonalizable, 245 

diagonalize, 247 

differential, 131 

differentiation, 131, 134-137 

eigenspace, 264, 401 

eigenvalue, 246, 371-374 

eigenvector, 246, 371-374 

elementary divisor, 539 

Householder operator, 397 

invariant subspace, 77—78, 313-— 
315 

isometry, 379 

Jordan canonical form, 483-516 

left shift, 76 

Lorentz transformation, 454—461 

minimal polynomial, 516-521 

nilpotent, 512 

normal, 370, 401-403 

orthogonal, 379-385, 472-478 

partial isometry, 394, 405 

positive definite, 377-378 

positive semidefinite, 377-378 

projection, 398-403 

projection on a subspace, 86, 
117 

projection on the x-axis, 66 

quotient space, 325-326 

rational canonical form, 526-548 

reflection, 66, 113, 117, 387, 472— 
478 

right shift, 76 

rotation, 66, 382, 387, 472-478 

self-adjoint, 373, 401-403 

simultaneous diagonalization, 282, 
405 

spectral decomposition, 402 

spectrum, 402 

unitary, 379-385, 403 


Linear space, see Vector space 
Linear transformation, (see also 


Linear operator), 65 
adjoint, 367 
composition, 86-89 
identity, 67 
image, see Range 
inverse, 99-102, 164-165 
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invertible, 99-102 
isomorphism, 102-105, 123, 425 
kernel, see Null space 
left-multiplication, 92-94 
linear functional, 119 
matrix representation, 80, 88— 
92, 347, 359 
null space, 67-69, 134-137 
nullity, 69-71 
one-to-one, 71 
onto, 71 
product with a scalar, 82 
pseudoinverse, 413 
range, 67-69 
rank, 69-71, 159 
restriction, 77—78 
singular value, 407 
singular value theorem, 406 
sum, 82 
transpose, 121, 126, 127 
vector space of, 82, 103 
zero, 67 
Local extremum, 439, 450 
Local maximum, 439, 450 
Local minimum, 439, 450 
Lorentz transformation, 454—461 
Lower triangular matrix, 229 


Markov chain, 291, 304 

Markov process, 291 

Matrix, 8 
addition, 9 
adjoint, 331, 359-360 
augmented, 161, 174 
change of coordinate, 112-115 
characteristic polynomial, 248 
classical adjoint, 208, 231 
coefficient, 169 
cofactor, 210, 232 
column of, 8 
column sum, 295 
companion, 526 
condition number, 469 
congruent, 426, 445, 451 
conjugate transpose, 331, 359— 

360 

consumption, 177 


Index 


convergence, 284-288 

determinant of, 200, 210, 232, 
367, 394 

diagonal, 18, 97 

diagonal entries of, 8 

diagonalizable, 246 

diagonalize, 247 

direct sum, 320-321, 496, 545 

eigenspace, 264 

eigenvalue, 246, 467-470 

eigenvector, 246 

elementary, 149-150, 159 

elementary divisor, 541 

elementary operations, 148 

entry, 8 

equality of, 9 

Euclidean norm, 467-470 

exponential of, 312, 515 

Gramian, 376 

Hessian, 440 

identity, 89 

incidence, 94-96, 98 

index, 445 

input-output, 177 

invariants, 445 

inverse, 100-102, 107, 161-164 

invertible, 100-102, 111, 223, 
469 

Jordan block, 483 

Jordan canonical form, 491 

limit of, 284-288 

lower triangular, 229 

minimal polynomial, 517-521 

multiplication with a scalar, 9 

nilpotent, 229, 512 

norm, 339, 467-470, 515 

normal, 370 

orthogonal, 229, 382-385 

orthogonally equivalent, 384-385 

permanent of a 2 x 2, 448 

polar decomposition, 411-413 

positive definite, 377 

positive semidefinite, 377 

product, 87-94 

product with a scalar, 9 

pseudoinverse, 414 

rank, 152-159 
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rational canonical form, 541 

reduced row echelon form, 185, 
190-191 

regular, 294 

representation of a bilinear form, 
424-428 

representation of a linear trans- 
formation, 80, 88-92, 347, 
359 

row of, 8 

row sum, 295 

scalar, 258 

self-adjoint, 373, 467 

signature, 445 

similarity, 115, 118, 259, 508 

simultaneous diagonalization, 282 

singular value, 410 

singular value decomposition, 410 

skew-symmetric, 23, 229, 371 

square, 9 

stochastic, see Transition ma- 
trix 

submatrix, 230 

sum, 9 

symmetric, 17, 373, 384, 389, 
446 

trace, 18, 20, 97, 118, 259, 281, 
331, 393 

transition, 288-291, 515 

transpose, 17, 20, 67, 88, 127, 
224, 259 

transpose of a matrix inverse, 
107 

transpose of a product, 88 

unitary, 229, 382-385 

unitary equivalence, 384-385, 394, 
A472 

upper triangular, 21, 218, 258, 
370, 385, 397 

Vandermonde, 230 

vector space, 9, 331, 425 

zero, 8 


Maximal element of a family of 


sets, 58 


Maximal linearly independent sub- 


set, 59-61 


Maximal principle, 59 
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Member, see Element 
Michelson—Morley experiment, 451 
Minimal polynomial 
of a linear operator, 516-521 
of a matrix, 517-521 
uniqueness, 516 
Minimal solution to a system of 
linear equations, 364-365 
Monic polynomial, 567-569 
Multiplicative inverse of an ele- 
ment of a field, 553 
Multiplicity of an eigenvalue, 263 
Multiplicity of an elementary di- 
visor, 539, 541 


n-dimensional volume, 226 
n-linear function, 238-242 
n-tuple, 7 

equality, 8 

scalar multiplication, 8 

sum, 8 

vector space, 8 
Nilpotent linear operator, 512 
Nilpotent matrix, 229, 512 
Nonhomogeneous linear differen- 
tial equation, 142 
Nonhomogeneous system of linear 
equations, 171 
Nonnegative vector, 177 
Norm 

Euclidean, 467—470 

of a function, 339 

of a matrix, 339, 467-470, 515 

of a vector, 333-336, 339 
Normal equations, 368 
Normal linear operator or matrix, 

370, 401-403 

Normalizing a vector, 335 
Null space, 67-69, 134-137 
Nullity, 69-71 
Numerical methods 

conditioning, 464 

QR factorization, 396-397 


Odd function, 21, 355 
One-to-one function, 551 
One-to-one linear transformation, 
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Onto function, 551 
Onto linear transformation, 71 
Open model of a simple economy, 
178-179 
Order 
of a differential equation, 129 
of a differential operator, 131, 
135 
Ordered basis, 79 
Orientation of an ordered basis, 
202 
Orthogonal complement, 349, 352, 
398-401 
Orthogonal equivalence of matrices, 
384-385 
Orthogonal matrix, 229, 382-385 
Orthogonal operator, 379-385, 472— 
478 
on R?, 387-388 
Orthogonal projection, 398-403 
Orthogonal projection of a vector, 
351 
Orthogonal subset, 335, 342 
Orthogonal vectors, 335 
Orthonormal basis, 341, 346-347, 
372 
Orthonormal subset, 335 


Parallel vectors, 3 
Parallelogram 
area of, 204 
law, 2, 337 
Parseval’s identity, 355 
Partial isometry, 394, 405 
Pendular motion, 143 
Penrose conditions, 421 
Periodic motion of a spring, 127, 
144 
Permanent of a 2 x 2 matrix, 448 
Perpendicular vectors, see Orthog- 
onal vectors 
Physics 
Hooke’s law, 128, 368 
pendular motion, 143 
periodic motion of a spring, 144 
special theory of relativity, 451— 
461 


Index 


spring constant, 368 
Polar decomposition of a matrix, 
411-413 
Polar identities, 338 
Polynomial, 9 
annihilator of a vector, 524, 528 
auxiliary, 131, 134, 137-140 
characteristic, 373 
coefficients of, 9 
degree of a, 10 
division algorithm, 562 
equality, 10 
function, 10, 51-53, 569 
fundamental theorem of alge- 
bra, 482, 560 
homogeneous of degree two, 433 
irreducible, 525, 567-569 
Lagrange, 51, 109, 125 
Legendre, 346 
minimal, 516-521 
monic, 567-569 
product with a scalar, 10 
quotient, 563 
relatively prime, 564 
remainder, 563 
splits, 262, 370, 373 
sum, 10 
trigonometric, 399 
unique factorization theorem, 568 
vector space, 10 
zero, 9 
zero of a, 62, 134, 560, 564 
Positive definite matrix, 377 
Positive definite operator, 377-378 
Positive semidefinite matrix, 377 
Positive semidefinite operator, 377— 
378 
Positive vector, 177 
Power set, 59 
Preimage of an element, 551 
Primary decomposition theorem, 
545 
Principal axis theorem, 390 
Probability, see Markov chain 
Probability vector, 289 
fixed, 301 
initial, 292 
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Product 
of a bilinear form and a scalar, 
423 
of complex numbers, 556 
elements of a field, 553 
of a linear transformation and 
scalar, 82 
of matrices, 87-94 
of a matrix and a scalar, 9 
of a vector and a scalar, 7 
Projection 
on a subspace, 76, 86, 98, 117, 
398-403 
on the z-axis, 66 
orthogonal, 398-403 
Proper subset, 549 
Proper value, see Eigenvalue 
Proper vector, see Eigenvector 
Pseudoinverse 
of a linear transformation, 413 
of a matrix, 414 
Pythagorean theorem, 337 
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QR factorization, 396-397 

Quadratic form, 389, 433-439 

Quotient of polynomials, 563 

Quotient space, 23, 58, 79, 109, 
325-326 


Range, 67-69, 551 
Rank 
of a bilinear form, 443 
of a linear transformation, 69— 
71, 159 
of a matrix, 152-159 
Rational canonical basis, 526 
Rational canonical form 
dot diagram, 535-539 
elementary divisor, 539, 541 
of a linear operator, 526-548 
of a matrix, 541 
uniqueness, 539 
Rayleigh quotient, 467 
Real part 
of a complex number, 556 
of a function, 129 
Reduced row echelon form of a 
matrix, 185, 190-191 


Reflection, 66, 117, 472-478 
of R*, 113, 382-383, 387, 388 
Regular transition matrix, 294 
Relation on a set, 551 
Relative change in a vector, 465 
Relatively prime polynomials, 564 
Remainder, 563 
Replacement theorem, 45-46 
Representation of a linear trans- 
formation by a matrix, 80 
Resolution of the identity opera- 
tor, 402 
Restriction 
of a function, 552 
of a linear operator on a sub- 
space, 77-78 
Right shift operator, 76 
Right-handed coordinate system, 
202 
Rigid motion, 385-387 
in the plane, 388 
Rotation, 66, 382, 387, 472-478 
Row of a matrix, 8 
Row operation, 148 
Row sum of matrices, 295 
Row vector, 8 
Rudin, Walter, 560 


Saddle point, 440 
Scalar, 7 
Scalar matrix, 258 
Scalar multiplication, 6 
Schur’s theorem 
for a linear operator, 370 
for a matrix, 385 
Second derivative test, 439-443, 
450 
Self-adjoint linear operator or ma- 
trix, 373, 401-403, 467 
Sequence, 11 
Set, 549-551 
chain, 59 
disjoint, 550 
element of a, 549 
empty, 549 
equality of, 549 
equivalence relation, 107, 394, 
449, 451 
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equivalence relation on a, 551 
intersection, 550 
linearly dependent, 36-40 
linearly independent, 37—40 
orthogonal, 335, 342 
orthonormal, 335 
power, 59 
proper subset, 549 
relation on a, 551 
subset, 549 
union, 549 
Signature 
of a bilinear form, 444 
of a matrix, 445 
Similar matrices, 115, 118, 259, 
508 
Simpson’s rule, 126 
Simultaneous diagonalization, 282, 
325, 327, 376, 405 
Singular value 
of a linear transformation, 407 
of a matrix, 410 
Singular value decomposition of a 
matrix, 410 
Singular value decomposition the- 
orem for matrices, 410 
Singular value theorem for linear 
transformations, 406 
Skew-symmetric matrix, 23, 229, 
371 
Solution 
of a differential equation, 129 
minimal, 364-365 
to a system of linear equations, 
169 
Solution set of a system of linear 
equations, 169, 182 
Solution space of a homogeneous 
differential equation, 132, 137— 
140 
Space-time coordinates, 453 
Span, 30, 34, 343 
Special theory of relativity, 451— 
461 
axioms, 453 
Lorentz transformation, 454—461 
space-time coordinates, 453 


Index 599 


475-478, 494, 545 
generated by a set, 30 
invariant, 77-78 


time contraction, 459-461 
Spectral decomposition, 402 
Spectral theorem, 401 


Spectrum, 402 sum, 275 
Splits, 262, 370, 373 zero, 16 
Spring, periodic motion of, 127, Sum 
144 of bilinear forms, 423 
Spring constant, 368 of complex numbers, 556 
Square matrix, 9 of elements of a field, 553 
Square root of a unitary operator, of functions, 9 
393 of linear transformations, 82 
Standard basis of matrices, 9 
for F”, 43 of n-tuples, 8 
for Pr(F), 43 of polynomials, 10 
Standard inner product on F”, 330 of subsets, 22 
Standard ordered basis of vectors, 7 
for F”, 79 Sum of subspaces, (see also Direct 


for P,(F), 79 
Standard representation of a vec- 
tor space, 104—105 
States 
absorbing, 304 
of a transition matrix, 288 
Stationary vector, see Fixed prob- 
ability vector 
Statistics, see Least squares ap- 
proximation 
Stochastic matrix, see Transition 
matrix 
Stochastic process, 291 
Submatrix, 230 
Subset, 549 
linearly dependent, 36-40 
linearly independent, 59-61 
maximal linearly independent, 
59-61 
orthogonal, 335, 342 
orthogonal complement of a, 349, 
352, 398-401 
orthonormal, 335 
span of a, 30, 34, 343 
sum, 22 
Subspace, 16-19, 50-51 
cyclic, 313-317 
dimension of a, 50-51 
direct sum, 22, 58, 98, 275-279, 
318, 355, 366, 394, 398, 401, 


sum, of subspaces), 275 
Sylvester’s law of inertia 
for a bilinear form, 443 
for a matrix, 445 
Symmetric bilinear form, 428-430, 
433-435 
Symmetric matrix, 17, 373, 384, 
389, 446 
System of differential equations, 
273, 516 
System of linear equations, 25-30, 
169 
augmented matrix, 174 
coefficient matrix, 169 
consistent, 169 
corresponding homogeneous sys- 
tem, 172 
equivalent, 182-183 
Gaussian elimination, 186-187 
general solution, 189 
homogeneous, 171 
ill-conditioned, 464 
inconsistent, 169 
minimal solution, 364-365 
nonhomogeneous, 171 
solution to, 169 
well-conditioned, 464 


T-annihilator, 524, 528 
T-cyclic basis, 526 


T-cyclic subspace, 313-317 
T-invariant subspace, 77—78, 313-— 
315 
Taylor’s theorem, 441 
Test for diagonalizability, 496 
Time contraction, 459-461 
Trace of a matrix, 18, 20, 97, 118, 
259, 281, 331, 393 
Transition matrix, 288-291, 515 
regular, 294 
states, 288 
Translation, 386 
‘Transpose 
of an invertible matrix, 107 
of a linear transformation, 121, 
126, 127 
of a matrix, 17, 20, 67, 88, 127, 
224, 259 
Trapezoidal rule, 126 
Triangle inequality, 333 
Trigonometric polynomial, 399 
Trivial representation of zero vec- 
tor, 36-38 


Union of sets, 549 
Unique factorization theorem for 
polynomials, 568 
Uniqueness 
of adjoint, 358 
of coefficients of a linear com- 
bination, 43 
of Jordan canonical form, 500 
of minimal polynomial, 516 
of rational canonical form, 539 
of size of a basis, 46 
Unit vector, 335 
Unitary equivalence of matrices, 
384-385, 394, 472 
Unitary matrix, 229, 382-385 
Unitary operator, 379-385, 403 
Upper triangular matrix, 21, 218, 
258, 370, 385, 397 


Vandermonde matrix, 230 
Vector, 7 
additive inverse of a, 12 
annihilator of a, 524, 528 
column, 8 
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coordinate, 80, 91, 110-111 

fixed probability, 301 

Fourier coefficients, 119, 348, 400 

initial probability, 292 

linear combination, 24 

nonnegative, 177 

norm, 333-336, 339 

normalizing, 335 

orthogonal, 335 

orthogonal projection of a, 351 

parallel, 3 

perpendicular, see Orthogonal 
vectors 

positive, 177 

probability, 289 

product with a scalar, 8 

Rayleigh quotient, 467 

row, 8 

sum, 7 

unit, 335 

zero, 12, 36-38 

Vector space, 6 

addition, 6 

basis, 43-49, 192-194 

of bilinear forms, 424 

of continuous functions, 18, 67, 
119, 331, 345, 356 

of cosets, 23 

dimension, 47—48, 103, 119, 425 

dual, 119-123 

finite-dimensional, 46-51 

of functions from a set into a 
field, 9, 109, 127 

infinite-dimensional, 47 

of infinitely differentiable func- 
tions, 130-137, 247, 523 

isomorphism, 102-105, 123, 425 

of linear transformations, 82, 103 

of matrices, 9, 103, 331, 425 

of n-tuples, 8 

of polynomials, 10, 86, 109 

quotient, 23, 58, 79, 109 

scalar multiplication, 6 

of sequences, 11, 109, 356, 369 

subspace, 16-19, 50-51 

zero, 15 

zero vector of a, 12 
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Volume of a parallelepiped, 226 


Wade, William R., 439 
Well-conditioned system, 464 
Wilkinson, J. H., 397 
Wronskian, 232 


Z2, 16, 42, 429, 553 
Zero matrix, 8 
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Zero of a polynomial, 62, 134, 560, 
564 
Zero polynomial, 9 
Zero subspace, 16 
Zero transformation, 67 
Zero vector, 12, 36-38 
trivial representation, 36-38 
Zero vector space, 15 


